Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 14:37, Austin S. Hemmelgarn wrote: On 2017-11-14 07:43, Austin S. Hemmelgarn wrote: On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken). No luck changing compression. I would suspect some issue with NFS, but I've started seeing the same symptoms on my laptop as well now (which is completely unrelated to any of the sets at work other than having an almost identical configuration other than paths and the total number of tapes). So, I finally got things working by switching from: storage "local-vtl" vault-storage "cloud" To: storage: "local-vtl" "cloud" And removing the "vault" option from the local-vtl storage definition. Strictly speaking, this is working around the issue instead of fixing it, but it fits within what we need for our usage, and actually makes the amdump runs complete faster (since dumps get taped to S3 in parallel with getting taped to the local vtapes). Based on this, and the fact that the issues I was seeing with corrupted dumps being reported by amcheckdump, I think the issue is probably an interaction between the vaulting code and the regular taping code, but I'm not certain. Thanks for the help.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 07:43, Austin S. Hemmelgarn wrote: On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken). No luck changing compression. I would suspect some issue with NFS, but I've started seeing the same symptoms on my laptop as well now (which is completely unrelated to any of the sets at work other than having an almost identical configuration other than paths and the total number of tapes).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there.
Re: Odd non-fatal errors in amdump reports.
On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. Jean-Louis This message is the property of CARBONITE, INC. and may contain confidential or privileged information. If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 12:52, Jean-Louis Martineau wrote: The previous patch broke something. Try this new set2-r2.diff patch Unfortunately, that doesn't appear to have fixed it, though the errors look different now. I'll try and get the log scrubbed by the end of the day and post it here. On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote: > On 2017-11-10 08:27, Jean-Louis Martineau wrote: >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? >>> OK, I've done some further investigation by tweaking the labeling a >>> bit (which actually fixed a purely cosmetic issue we were having), >>> but I'm still seeing the same problem that prompted this thread, and >>> I can confirm that the dumps are where Amanda is trying to look for >>> them, it's just not seeing them for some reason. I hadn't thought >>> of this before, but could it have something to do with the virtual >>> tape library being auto-mounted over NFS on the backup server? >>> >> Austin, >> >> Can you try to see if amfetchdump can restore it? >> >> * amfetchdump CONFIG client2 /boot 20171024084159 >> > amfetchdump doesn't see it, and neither does amrecover, but the files > for the given parts are definitely there (I know for a fact that the > dump in question has exactly one part, and the file for that does > exist on the virtual tape mentioned in the log file). > > I'm probably not going to be able to check more on this today, but > I'll likely be checking if amrestore and amadmin find can see them. >
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 12:52, Jean-Louis Martineau wrote: The previous patch broke something. Try this new set2-r2.diff patch Given that the switch to NFSv4 combined with a change to the labeling scheme fixed the other issue, I'm going to re-test these two sets with the same changes before I test the patch just so I've got something current to compare against. I should have results from that later today, and will likely be testing this patch tomorrow if things aren't resolved by the other changes (and based on what you've said and what I've seen, I don't think the switch to NFSv4 or the labeling change will fix this one). Jean-Louis On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote: > On 2017-11-10 08:27, Jean-Louis Martineau wrote: >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? >>> OK, I've done some further investigation by tweaking the labeling a >>> bit (which actually fixed a purely cosmetic issue we were having), >>> but I'm still seeing the same problem that prompted this thread, and >>> I can confirm that the dumps are where Amanda is trying to look for >>> them, it's just not seeing them for some reason. I hadn't thought >>> of this before, but could it have something to do with the virtual >>> tape library being auto-mounted over NFS on the backup server? >>> >> Austin, >> >> Can you try to see if amfetchdump can restore it? >> >> * amfetchdump CONFIG client2 /boot 20171024084159 >> > amfetchdump doesn't see it, and neither does amrecover, but the files > for the given parts are definitely there (I know for a fact that the > dump in question has exactly one part, and the file for that does > exist on the virtual tape mentioned in the log file). > > I'm probably not going to be able to check more on this today, but > I'll likely be checking if amrestore and amadmin find can see them. >
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:45, Austin S. Hemmelgarn wrote: On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 At the moment, I'm re-testing things after tweaking some NFS parameters for the virtual tape library (apparently the FreeNAS server that's actually storing the data didn't have NFSv4 turned on, so it was mounted with NFSv3, which we've had issues with before on our network), so I can't exactly check immediately, but assuming the problem repeats, I'll do that first thing once the test dump is done. It looks like the combination of fixing the incorrect labeling in the config and switching to NFSv4 fixed this particular case.
Re: Odd non-fatal errors in amdump reports.
The previous patch broke something.
Try this new set2-r2.diff patch
Jean-Louis
On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
On 2017-11-10 08:27, Jean-Louis Martineau wrote:
On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
On 2017-11-08 08:03, Jean-Louis Martineau wrote:
On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>> Austin,
>>
>> It's hard to say something with only the error message.
>>
>> Can you post the amdump. and log..0 for
the 2
>> backup set that fail.
>>
> I've attached the files (I would put them inline, but one of the
sets
> has over 100 DLE's, so the amdump file is huge, and the others are
> still over 100k each, and I figured nobody want's to try and wad
> through those in-line).
>
> The set1 and set2 files are for the two backup sets that show the
> header mismatch error, and the set3 files are for the one that
claims
> failures in the dump summary.
I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'
client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]
They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to
find on
tape Server-01. It is an older dump.
Do Server-01 is still there? Did it still contains the dump?
OK, I've done some further investigation by tweaking the labeling a
bit (which actually fixed a purely cosmetic issue we were having),
but I'm still seeing the same problem that prompted this thread, and
I can confirm that the dumps are where Amanda is trying to look for
them, it's just not seeing them for some reason. I hadn't thought
of this before, but could it have something to do with the virtual
tape library being auto-mounted over NFS on the backup server?
Austin,
Can you try to see if amfetchdump can restore it?
* amfetchdump CONFIG client2 /boot 20171024084159
amfetchdump doesn't see it, and neither does amrecover, but the files
for the given parts are definitely there (I know for a fact that the
dump in question has exactly one part, and the file for that does
exist on the virtual tape mentioned in the log file).
I'm probably not going to be able to check more on this today, but
I'll likely be checking if amrestore and amadmin find can see them.
This message is the property of CARBONITE, INC. and may contain confidential or
privileged information.
If this message has been delivered to you by mistake, then do not copy or
deliver this message to anyone. Instead, destroy it and notify me by reply
e-mail
diff --git a/perl/Amanda/DB/Catalog.pm b/perl/Amanda/DB/Catalog.pm
index 56f7d70..44d2242 100644
--- a/perl/Amanda/DB/Catalog.pm
+++ b/perl/Amanda/DB/Catalog.pm
@@ -468,7 +468,7 @@ sub get_latest_write_timestamp {
if (@timestamps) {
# if we're not looking for a particular type, then this is easy
- if (!exists $params{'types'}) {
+ if (!defined $params{'types'}) {
return $timestamps[-1];
}
@@ -524,20 +524,20 @@ sub get_parts_and_dumps {
# pre-process params by appending all of the "singular" parameters to the "plurals"
push @{$params{'write_timestamps'}}, map { zeropad($_) } $params{'write_timestamp'}
- if exists($params{'write_timestamp'});
+ if defined($params{'write_timestamp'});
push @{$params{'dump_timestamps'}}, map { zeropad($_) } $params{'dump_timestamp'}
- if exists($params{'dump_timestamp'});
+ if defined($params{'dump_timestamp'});
push @{$params{'hostnames'}}, $params{'hostname'}
- if exists($params{'hostname'});
+ if defined($params{'hostname'});
push @{$params{'disknames'}}, $params{'diskname'}
- if exists($params{'diskname'});
+ if defined($params{'diskname'});
push @{$params{'levels'}}, $params{'level'}
- if exists($params{'level'});
+ if defined($params{'level'});
push @{$params{'storages'}}, $params{'storage'}
if defined($params{'storage'});
if ($get_what eq 'parts') {
push @{$params{'labels'}}, $params{'label'}
- if exists($params{'label'});
+ if defined($params{'label'});
} else {
delete $params{'labels'};
}
@@ -562,7 +562,7 @@ sub get_parts_and_dumps {
my @logfiles;
if ($params{'holding'}) {
@logfiles = ( 'holding', );
-} elsif (exists($params{'write_timestamps'})) {
+} elsif (defined($params{'write_timestamps'})) {
# if we have specific write_timestamps, the job is pretty easy.
my %timestamps_hash = map { ($_, und
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 amfetchdump doesn't see it, and neither does amrecover, but the files for the given parts are definitely there (I know for a fact that the dump in question has exactly one part, and the file for that does exist on the virtual tape mentioned in the log file). I'm probably not going to be able to check more on this today, but I'll likely be checking if amrestore and amadmin find can see them.
Re: Odd non-fatal errors in amdump reports.
On 10/11/17 10:10 AM, Austin S. Hemmelgarn wrote: On 2017-11-10 10:00, Jean-Louis Martineau wrote: Austin, Can you try the attached patch, I think it could fix the set1 and set2 errors. Yes, but I won't be able to log in this weekend to revert it if it doesn't work, so I won't be able to test it until Monday. Am I correct in assuming that it only needs to be applied on the server and not the clients? Yes, only on the server Jean-Louis On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. This message is the property of CARBONITE, INC. and may contain confidential or privileged information. If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 10:00, Jean-Louis Martineau wrote: Austin, Can you try the attached patch, I think it could fix the set1 and set2 errors. Yes, but I won't be able to log in this weekend to revert it if it doesn't work, so I won't be able to test it until Monday. Am I correct in assuming that it only needs to be applied on the server and not the clients? On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary.
Re: Odd non-fatal errors in amdump reports.
Austin,
Can you try the attached patch, I think it could fix the set1 and set2
errors.
Jean-Louis
On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
On 2017-11-07 10:22, Jean-Louis Martineau wrote:
Austin,
It's hard to say something with only the error message.
Can you post the amdump. and log..0 for the 2
backup set that fail.
I've attached the files (I would put them inline, but one of the sets
has over 100 DLE's, so the amdump file is huge, and the others are
still over 100k each, and I figured nobody want's to try and wad
through those in-line).
The set1 and set2 files are for the two backup sets that show the
header mismatch error, and the set3 files are for the one that claims
failures in the dump summary.
This message is the property of CARBONITE, INC. and may contain confidential or
privileged information.
If this message has been delivered to you by mistake, then do not copy or
deliver this message to anyone. Instead, destroy it and notify me by reply
e-mail
diff --git a/perl/Amanda/Recovery/Planner.pm b/perl/Amanda/Recovery/Planner.pm
index 7bf09c7..ecb8cc2 100644
--- a/perl/Amanda/Recovery/Planner.pm
+++ b/perl/Amanda/Recovery/Planner.pm
@@ -235,16 +235,24 @@ sub make_plan {
my $self = shift;
my %params = @_;
-for my $rq_param (qw(plan_cb dumpspecs)) {
+for my $rq_param (qw(plan_cb )) {
croak "required parameter '$rq_param' missing"
unless exists $params{$rq_param};
}
my $status = $params{'status'};
my $dumpspecs = $params{'dumpspecs'};
+my $hostname = $params{'hostname'};
+my $diskname = $params{'diskname'};
+my $dump_timestamp = $params{'dump_timestamp'};
+my $level = $params{'level'};
my $src_labelstr = $params{'src_labelstr'};
# first, get the set of dumps that match these dumpspecs
my @dumps = Amanda::DB::Catalog::get_dumps(dumpspecs => $dumpspecs,
+ hostname => $hostname,
+ diskname => $diskname,
+ dump_timestamp => $dump_timestamp,
+ level => $level,
status => $status,
labelstr => $src_labelstr);
diff --git a/perl/Amanda/Taper/Worker.pm b/perl/Amanda/Taper/Worker.pm
index 7f205be..c501a20 100644
--- a/perl/Amanda/Taper/Worker.pm
+++ b/perl/Amanda/Taper/Worker.pm
@@ -944,7 +944,10 @@ sub setup_and_start_dump {
undef);
my @storage_list = ( $self->{'src_storage'} );
Amanda::Recovery::Planner::make_plan(
- dumpspecs => \@dumpspecs,
+ hostname => $self->{'hostname'},
+ diskname => $self->{'diskname'},
+ dump_timestamp => $self->{'datestamp'},
+ level => $self->{'level'},
changer => $chg,
storage_list => \@storage_list,
only_in_storage => 1,
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 At the moment, I'm re-testing things after tweaking some NFS parameters for the virtual tape library (apparently the FreeNAS server that's actually storing the data didn't have NFSv4 turned on, so it was mounted with NFSv3, which we've had issues with before on our network), so I can't exactly check immediately, but assuming the problem repeats, I'll do that first thing once the test dump is done.
Re: Odd non-fatal errors in amdump reports.
On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 Jean-Louis This message is the property of CARBONITE, INC. and may contain confidential or privileged information. If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
Re: Odd non-fatal errors in amdump reports.
On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server?
Re: Odd non-fatal errors in amdump reports.
On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? Hmm, looks like that's a leftover from changing our labeling format shortly after switching to this new configuration. I thought I purged all the stuff with the old label scheme, but I guess not. It somewhat surprises me that this doesn't give any kind of error indication in the e-mail report beyond the 'FAILED' line in the dump summary.
Re: Odd non-fatal errors in amdump reports.
On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: On 2017-11-07 10:22, Jean-Louis Martineau wrote: Austin, It's hard to say something with only the error message. Can you post the amdump. and log..0 for the 2 backup set that fail. I've attached the files (I would put them inline, but one of the sets has over 100 DLE's, so the amdump file is huge, and the others are still over 100k each, and I figured nobody want's to try and wad through those in-line). The set1 and set2 files are for the two backup sets that show the header mismatch error, and the set3 files are for the one that claims failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? Jean-Louis This message is the property of CARBONITE, INC. and may contain confidential or privileged information. If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
Re: Odd non-fatal errors in amdump reports.
On 2017-11-07 10:22, Jean-Louis Martineau wrote:
Austin,
It's hard to say something with only the error message.
Can you post the amdump. and log..0 for the 2
backup set that fail.
Yes, though it may take me a while since our policy is pretty strict
about scrubbing hostnames and usernames from any internal files we make
visible publicly.
Just to clarify, it will end up being 3 total pairs of files, two from
backup sets that show the first issue I mentioned (the complaint about a
header mismatch), and one from the backup set showing the second issue I
mentioned (the apparently bogus dump failures listed in the dump summary).
The tapedev of the aws changer can be written like:
tapedev "chg-multi:s3:/slot-{0..127}
Thanks, I hadn't know that the configuration file syntax supported
sequences like this, that makes it look so much nicer!
Jean-Louis
On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
> Where I work, we recently switched from manually triggered vaulting to
> automatic vaulting using the vault-storage, vault, and dump-selection
> options. Things appear to be working correctly, but we keep getting
> some odd non-fatal error messages (that might be bogus as well, since
> I've verified the dumps mentioned restore correctly) in the amdump
> e-mails. I've been trying to figure out these 'errors' for the past
> few weeks now, and I'm hoping someone on the list might have some advice
> (or better yet, might recognize the symptoms and know how to fix them).
>
> In our configuration, we have three different backup sets (each is on
> it's own schedule). Of these, two are consistently showing the following
> error in the amdump e-mail report (I've redacted hostnames and exact
paths,
> the second path listed though is a parent directory of the first):
>
> taper: FATAL Header of dumpfile does not match command from driver 0
XXX /home/X 20171031074642 -- 0 XXX
/home/XX 20171031074642 at
/usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168
>
> For a given backup set, the particular hostname and paths are always the
> same, but the backup appears to get taped correctly, and restores
> correctly as well.
>
> With the third backup set, we're regularly seeing things like the
> following in the dump summary section, but no other visible error
> messages:
>
> DUMPER STATS TAPER STATS
> HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
> - --
---
> XX /boot 0 -- FAILED
> XX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
>
> In this case, the particular DLE's affected are always the same,
> and the first line that claims a failure always shows dump level
> zero, even when the backup is supposed to be at another level.
> Just like the other error, the affected dumps always restore
> correctly when tested, and get correctly vaulted as well. The
> affected DLE's are only on Linux systems, but it seems to not
> care what distro or amanda version is being used (it's affected,
> Debian, Gentoo, and Fedora systems, and covers 5 different
> Amanda client versions), and are invariably small (sub-gigabyte)
> filesystems, but I've not found any other commonality among them.
>
> All three sets use essentially the same amanda.conf file (the
> differences are literally just in when they get run), which
> I've attached in-line at the end of this e-mail with
> sensitive data redacted. The thing I find particularly odd is
> that this config is essentially identical to what I use on my
> personal systems, which are not exhibiting either problem.
>
> 8<
>
> org "X"
> mailto "admin"
> dumpuser "amanda"
> inparallel 2
> dumporder "Ss"
> taperalgo largestfit
>
> displayunit "k"
> netusage 800 Kbps
>
> dumpcycle 4 weeks
> runspercycle 28
> tapecycle 128 tapes
>
> bumppercent 20
> bumpdays 2
>
> etimeout 900
> dtimeout 1800
> ctimeout 30
>
> device_output_buffer_size 256M
>
> compress-index no
>
> flush-threshold-dumped 0
> flush-threshold-scheduled 0
> taperflush 0
> autoflush yes
>
> runtapes 16
>
> define changer vtl {
> tapedev "chg-disk:/net/XX/amanda/X"
> changerfile "/etc/amanda/X/changer"
> property "num-slot" "128"
> property "auto-create-slot" "yes"
> }
>
> define changer aws {
> tapedev
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
> changerfile "/etc/amanda/X/s3-ch
Re: Odd non-fatal errors in amdump reports.
Austin,
It's hard to say something with only the error message.
Can you post the amdump. and log..0 for the 2
backup set that fail.
The tapedev of the aws changer can be written like:
tapedev "chg-multi:s3:/slot-{0..127}
Jean-Louis
On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
Where I work, we recently switched from manually triggered vaulting to
automatic vaulting using the vault-storage, vault, and dump-selection
options. Things appear to be working correctly, but we keep getting
some odd non-fatal error messages (that might be bogus as well, since
I've verified the dumps mentioned restore correctly) in the amdump
e-mails. I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).
In our configuration, we have three different backup sets (each is on
it's own schedule). Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):
taper: FATAL Header of dumpfile does not match command from driver 0 XXX
/home/X 20171031074642 -- 0 XXX /home/XX
20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm
line 1168
For a given backup set, the particular hostname and paths are always the
same, but the backup appears to get taped correctly, and restores
correctly as well.
With the third backup set, we're regularly seeing things like the
following in the dump summary section, but no other visible error
messages:
DUMPER
STATS TAPER STATS
HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS
KB/s MMM:SS KB/s
- --
---
XX /boot 0-- FAILED
XX /boot 1 10 10-- 0:00
168.8 0:00 0.0
In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well. The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.
All three sets use essentially the same amanda.conf file (the
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted. The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.
8<
org "X"
mailto "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit
displayunit "k"
netusage 800 Kbps
dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes
bumppercent 20
bumpdays 2
etimeout 900
dtimeout 1800
ctimeout 30
device_output_buffer_size 256M
compress-index no
flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes
runtapes 16
define changer vtl {
tapedev "chg-disk:/net/XX/amanda/X"
changerfile "/etc/amanda/X/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}
define changer aws {
tapedev
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/X/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" ""
device-property "S3_SECRET_KEY"
""
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "X"
device-property "STORAGE_API" "AWS4"
}
define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
labelstr "^-[0-9][0-9]*$"
autolabel "-%%%" any
erase-on-full YES
er
Odd non-fatal errors in amdump reports.
Where I work, we recently switched from manually triggered vaulting to
automatic vaulting using the vault-storage, vault, and dump-selection
options. Things appear to be working correctly, but we keep getting
some odd non-fatal error messages (that might be bogus as well, since
I've verified the dumps mentioned restore correctly) in the amdump
e-mails. I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).
In our configuration, we have three different backup sets (each is on
it's own schedule). Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):
taper: FATAL Header of dumpfile does not match command from driver 0 XXX
/home/X 20171031074642 -- 0 XXX /home/XX
20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm
line 1168
For a given backup set, the particular hostname and paths are always the
same, but the backup appears to get taped correctly, and restores
correctly as well.
With the third backup set, we're regularly seeing things like the
following in the dump summary section, but no other visible error
messages:
DUMPER
STATS TAPER STATS
HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS
KB/s MMM:SS KB/s
- --
---
XX /boot 0--
FAILED
XX /boot 1 10 10-- 0:00
168.8 0:00 0.0
In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well. The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.
All three sets use essentially the same amanda.conf file (the
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted. The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.
8<
org "X"
mailto "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit
displayunit "k"
netusage 800 Kbps
dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes
bumppercent 20
bumpdays 2
etimeout 900
dtimeout 1800
ctimeout 30
device_output_buffer_size 256M
compress-index no
flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes
runtapes 16
define changer vtl {
tapedev "chg-disk:/net/XX/amanda/X"
changerfile "/etc/amanda/X/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}
define changer aws {
tapedev
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/X/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" ""
device-property "S3_SECRET_KEY"
""
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "X"
device-property "STORAGE_API" "AWS4"
}
define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
labelstr "^-[0-9][0-9]*$"
autolabel "-%%%" any
erase-on-full YES
erase-on-failure YES
vault cloud 0
}
define storage cloud {
tpchanger "aws"
tapepool "$r"
tapetype "S3TAPE"
labelstr "^Vault--[0-9][0-9]*$"
autolabel "Vault--%%%" any
erase-on-full YES
erase-on-failure YES
