Re: did it again. -- crc differ

2020-12-19 Thread Gene Heskett
On Saturday 19 December 2020 15:26:07 Nathan Stratton Treadway wrote:

> On Sat, Dec 19, 2020 at 14:43:56 -0500, Gene Heskett wrote:
> > On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote:
> > > On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> > > > new error file, from /home on GO704:(word wrap off)
> > > >
> > > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1
> > >
> > > Okay, that output looks good good.
> > >
> > > for completeness, can you post the section from this Amanda Report
> > > covering this error?
> >
> > In the last post.
>
> In that message I see the quoted "FAILURE DUMP SUMMARY" section for
> the earlier failure but not the report for when GO704:/home failed...
>
here it is:
  driver: GO704 /home 20201219085654 0 [Will retry dump because of holding disk 
error: source server crc 
(1452994d:2018270728) and input server crc (adcf8473:2018270728) differ)]
  taper: tape Dailys-25 kb 16735269 fm 79 [OK]

and:
GO704   /home   071511925   --  20:52  1574.2  0:01 
1970967.0 PARTIAL
FLUSH 20:51 
 1575.5
thanks Nathan>
>   Nathan
>
> --
>-- Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic
> region Ray Ontko & Co.  -  Software consulting services  -  
> http://www.ontko.com/ GPG Key:
> http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239 Key
> fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Copyright 2019 by Maurice E. Heskett
Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 


Re: did it again. -- crc differ

2020-12-19 Thread Gene Heskett
On Saturday 19 December 2020 15:26:07 Nathan Stratton Treadway wrote:

> On Sat, Dec 19, 2020 at 14:43:56 -0500, Gene Heskett wrote:
> > On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote:
> > > On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> > > > new error file, from /home on GO704:(word wrap off)
> > > >
> > > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1
> > >
> > > Okay, that output looks good good.
> > >
> > > for completeness, can you post the section from this Amanda Report
> > > covering this error?
> >
> > In the last post.
>
> In that message I see the quoted "FAILURE DUMP SUMMARY" section for
> the earlier failure but not the report for when GO704:/home failed...
>
> > No hits on the crc from the previous post, adcf8473:2018270728, any
> > place in that /var/log/amanda tree.
>
> Anything under /tmp/amanda/ (there on GO704)?
Just the excludes, all alike, 42 bytes containing:
/*.iso
.gvfs
./Ksocket-gene
./orbit-gene

>
>   Nathan
>
> --
>-- Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic
> region Ray Ontko & Co.  -  Software consulting services  -  
> http://www.ontko.com/ GPG Key:
> http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239 Key
> fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Copyright 2019 by Maurice E. Heskett
Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 


Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway
On Sat, Dec 19, 2020 at 14:43:56 -0500, Gene Heskett wrote:
> On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote:
> 
> > On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> > > new error file, from /home on GO704:(word wrap off)
> > >
> > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1
> >
> > Okay, that output looks good good.
> >
> > for completeness, can you post the section from this Amanda Report
> > covering this error?
> 
> In the last post.

In that message I see the quoted "FAILURE DUMP SUMMARY" section for the
earlier failure but not the report for when GO704:/home failed...



> No hits on the crc from the previous post, adcf8473:2018270728, any place 
> in that /var/log/amanda tree.

Anything under /tmp/amanda/ (there on GO704)?

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: did it again. -- crc differ

2020-12-19 Thread Gene Heskett
On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote:

> On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> > new error file, from /home on GO704:(word wrap off)
> >
> > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1
>
> Okay, that output looks good good.
>
> for completeness, can you post the section from this Amanda Report
> covering this error?

In the last post.

> > New crc's
> >
> > root@coyote:GenesAmandaHelper-0.61$ grep adcf8473:2018270728
> > /usr/local/var/amanda/Daily/*
>
> G0704 is a separate Amanda client machine, right?

Yes, its one of my cnc driver machines. And old Dell with an identical 
240GB SSD drive in it.
> Can you do a 
> similar grep in the amanda debug/log files over on that machine, too?

No hits on the crc from the previous post, adcf8473:2018270728, any place 
in that /var/log/amanda tree.

root@GO704:/var/log/amanda# grep -R adcf8473:2018270728 *
root@GO704:/var/log/amanda#   


Thanks Nathan
>
>   Nathan
>
> --
>-- Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic
> region Ray Ontko & Co.  -  Software consulting services  -  
> http://www.ontko.com/ GPG Key:
> http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239 Key
> fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Copyright 2019 by Maurice E. Heskett
Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 


Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway
On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> new error file, from /home on GO704:(word wrap off)
> 
> dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1

Okay, that output looks good good.

for completeness, can you post the section from this Amanda Report
covering this error?


> New crc's
> 
> root@coyote:GenesAmandaHelper-0.61$ grep adcf8473:2018270728 
> /usr/local/var/amanda/Daily/*

G0704 is a separate Amanda client machine, right?  Can you do a similar
grep in the amanda debug/log files over on that machine, too?


Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: did it again. -- crc differ

2020-12-19 Thread Gene Heskett
On Saturday 19 December 2020 09:42:55 Nathan Stratton Treadway wrote:

> On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote:
> > But the problem is not fixed:
>
> Well, at least this time it's a one-part dump file, so that may make
> investigation at little easier
>
> > FAILURE DUMP SUMMARY:
> >   rpi4 /usr/lib lev 0  partial taper: source server crc
> > (efe0c707:1538583893) and input server crc (fa79e777:1538583893)
> > differ)
> >   rpi4 /usr/lib lev 0  was successfully retried
> >
> > But the failed dump is still in the holding disk:
> >
> > root@coyote:config-bak$ ls -l /sdb/dumps/20201219020104/
> > total 1502560
> > -rw--- 1 amanda amanda 1538616661 Dec 19 02:13 rpi4._usr_lib.0
> >
> > >From the emailed report:
> >
> >   driver: rpi4 /usr/lib 20201219020104 0 [Will retry dump because of
> > holding disk error: source server crc (efe0c707:1538583893) and
> > input server crc (fa79e777:1538583893) differ)] taper: tape
> > Dailys-24 kb 16495500 fm 79 [OK]
> >
> > and:
> > rpi4  /usr/lib 0 3273 1467  -- 5:23 10366.4  0:01 1502523.0 PARTIAL
> > FLUSH  5:11  4831.3
> >
> > Even the sizes don't match so of course the crc's won't either.
>
> Note that the two sizes mentioned in the error message do match
> (1538583893), so I think the full file is getting transfered.
>
> (The file on the holding disk is 32kiB larger, i.e. the size of the
> Amanda header:  1538616661-1538583893=32768 .)
>
>
> What's the header of that holding-disk file look like? (e.g.
>   $ sudo dd if=/sdb/dumps/20201219020104/rpi4._usr_lib.0 bs=32k
> count=1 )
new error file, from /home on GO704:(word wrap off)

dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1

AMANDA: FILE 20201219085654 GO704 /home  lev 0 comp .gz program APPLICATION
APPLICATION=amgtar
ORIGSIZE=7322300
SERVER-CRC=adcf8473:2018270728
DLE=<
  APPLICATION
  /home
  0
  bsdtcp
  BEST
  YES
  YES
  AMANDA
  
/GenesAmandaHelper-0.61/excludes
  
  
amgtar

  ignore
  :_socket_ignored$  file_changed_as_we_read_it$


  one-file-system
  yes


  check-device
  no

  

ENDDLE
To restore, position tape at start of file and run:
dd if= skip=1 | /bin/gzip -dc | 
/usr/lib/amanda/application/amgtar restore [./file-to-restore]+

That's a different error and that machines background activity may have 
caused apt-get to refresh its list of available updates at about that
time of the morning. That however, should not have affetcted /home.
There was no other commanded activity at the time although I do have 
an ssh session open to all machines from this machine full time, and 
they are all mounted here via sshnet. Beats nfs like a 
white nosed mule since it Just Works.
>
> Do you get any hits when you grep the Amanda debug and log files for
> those two CRC values ( efe0c707 and fa79e777 )?

New crc's

root@coyote:GenesAmandaHelper-0.61$ grep adcf8473:2018270728 
/usr/local/var/amanda/Daily/*
/usr/local/var/amanda/Daily/amdump.1:driver: result time 2044.439 from 
chunker3: DONE 03-00051 
1970967 "adcf8473:2018270728" "[sec 1253.143125 kb 1970967 kps 1572.818747]"
/usr/local/var/amanda/Daily/amdump.1:driver: result time 2045.654 from taper0: 
PARTIAL worker0-0 02-00088 INPUT-ERROR 
TAPE-GOOD "1452994d:2018270728" "[sec 1.00 bytes 2018270728 kps 
1970967.00 orig-kb 7322300]" "source server crc 
(1452994d:2018270728) and input server crc (adcf8473:2018270728) differ)" ""
/usr/local/var/amanda/Daily/amdump.1:driver: taper failed GO704 /home: source 
server crc (1452994d:2018270728) and input 
server crc (adcf8473:2018270728) differ)
/usr/local/var/amanda/Daily/amdump.20201219085654:driver: result time 2044.439 
from chunker3: DONE 03-00051 
1970967 "adcf8473:2018270728" "[sec 1253.143125 kb 1970967 kps 1572.818747]"
/usr/local/var/amanda/Daily/amdump.20201219085654:driver: result time 2045.654 
from taper0: PARTIAL worker0-0 02-00088 
INPUT-ERROR TAPE-GOOD "1452994d:2018270728" "[sec 1.00 bytes 2018270728 kps 
1970967.00 orig-kb 7322300]" "source 
server crc (1452994d:2018270728) and input server crc (adcf8473:2018270728) 
differ)" ""
/usr/local/var/amanda/Daily/amdump.20201219085654:driver: taper failed GO704 
/home: source server crc 
(1452994d:2018270728) and input server crc (adcf8473:2018270728) differ)
grep: /usr/local/var/amanda/Daily/curinfo: Is a directory
grep: /usr/local/var/amanda/Daily/gnutar-lists: Is a directory
grep: /usr/local/var/amanda/Daily/index: Is a directory
/usr/local/var/amanda/Daily/log:SUCCESS chunker GO704 /home 20201219085654 0 
adcf8473:2018270728 [sec 1253.143125 kb 
1970967 kps 1572.818747]
/usr/local/var/amanda/Daily/log:PARTIAL taper "ST:Daily" "POOL:Daily" GO704 
/home 20201219085654 1 0 :0 
:0 adcf8473:2018270728 [sec 1.00 bytes 2018270728 kps 
1970967.00 orig-kb 7322300] "source server crc 
(1452994d:2018270728) and input server crc (adcf8473:2018270728) differ)"
/usr/local/var/amanda/Daily/log:INFO driver GO704 /home 2020121

Re: did it again. -- crc differ

2020-12-19 Thread Gene Heskett
On Saturday 19 December 2020 09:42:55 Nathan Stratton Treadway wrote:

> On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote:
> > But the problem is not fixed:
>
> Well, at least this time it's a one-part dump file, so that may make
> investigation at little easier
>
> > FAILURE DUMP SUMMARY:
> >   rpi4 /usr/lib lev 0  partial taper: source server crc
> > (efe0c707:1538583893) and input server crc (fa79e777:1538583893)
> > differ)
> >   rpi4 /usr/lib lev 0  was successfully retried
> >
> > But the failed dump is still in the holding disk:
> >
> > root@coyote:config-bak$ ls -l /sdb/dumps/20201219020104/
> > total 1502560
> > -rw--- 1 amanda amanda 1538616661 Dec 19 02:13 rpi4._usr_lib.0
> >
> > >From the emailed report:
> >
> >   driver: rpi4 /usr/lib 20201219020104 0 [Will retry dump because of
> > holding disk error: source server crc (efe0c707:1538583893) and
> > input server crc (fa79e777:1538583893) differ)] taper: tape
> > Dailys-24 kb 16495500 fm 79 [OK]
> >
> > and:
> > rpi4  /usr/lib 0 3273 1467  -- 5:23 10366.4  0:01 1502523.0 PARTIAL
> > FLUSH  5:11  4831.3
> >
> > Even the sizes don't match so of course the crc's won't either.
>
> Note that the two sizes mentioned in the error message do match
> (1538583893), so I think the full file is getting transfered.
>
> (The file on the holding disk is 32kiB larger, i.e. the size of the
> Amanda header:  1538616661-1538583893=32768 .)

That makes sense, but its been nuked as I'm running another test backup 
right now, and those left overs are an error. I am trying to get my 
script to send kmail a dcop message, but amanda apparently has no rights 
to use dcop. msg posted to trinity about that.
>
> What's the header of that holding-disk file look like? (e.g.
>   $ sudo dd if=/sdb/dumps/20201219020104/rpi4._usr_lib.0 bs=32k
> count=1 )

The next time it happens I will investigate before I nuke the trash.

> Do you get any hits when you grep the Amanda debug and log files for
> those two CRC values ( efe0c707 and fa79e777 )?
>
I don't know, and since I no longer have the failures data, not a lot of 
use grepping the logs.

But it failed on a level 0 of /GO704/home this time, so I'll post a 
separate message with those details when collected.
>
>   Nathan
>
>
>
> --
>-- Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic
> region Ray Ontko & Co.  -  Software consulting services  -  
> http://www.ontko.com/ GPG Key:
> http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239 Key
> fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Copyright 2019 by Maurice E. Heskett
Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 


Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway
On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote:
> But the problem is not fixed:

Well, at least this time it's a one-part dump file, so that may make
investigation at little easier

> 
> FAILURE DUMP SUMMARY:
>   rpi4 /usr/lib lev 0  partial taper: source server crc (efe0c707:1538583893) 
> and input server crc (fa79e777:1538583893) 
> differ)
>   rpi4 /usr/lib lev 0  was successfully retried
> 
> But the failed dump is still in the holding disk:
> 
> root@coyote:config-bak$ ls -l /sdb/dumps/20201219020104/
> total 1502560
> -rw--- 1 amanda amanda 1538616661 Dec 19 02:13 rpi4._usr_lib.0
> 
> >From the emailed report:
> 
>   driver: rpi4 /usr/lib 20201219020104 0 [Will retry dump because of holding 
> disk error: source server crc 
> (efe0c707:1538583893) and input server crc (fa79e777:1538583893) differ)]
>   taper: tape Dailys-24 kb 16495500 fm 79 [OK]
> 
> and:
> rpi4  /usr/lib 0 3273 1467  -- 5:23 10366.4  0:01 1502523.0 PARTIAL FLUSH  
> 5:11  4831.3
> 
> Even the sizes don't match so of course the crc's won't either.

Note that the two sizes mentioned in the error message do match
(1538583893), so I think the full file is getting transfered.

(The file on the holding disk is 32kiB larger, i.e. the size of the
Amanda header:  1538616661-1538583893=32768 .)


What's the header of that holding-disk file look like? (e.g.
  $ sudo dd if=/sdb/dumps/20201219020104/rpi4._usr_lib.0 bs=32k count=1
)

Do you get any hits when you grep the Amanda debug and log files for
those two CRC values ( efe0c707 and fa79e777 )?


Nathan




Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway
On Mon, Nov 30, 2020 at 18:41:40 -0500, Gene Heskett wrote:
> > On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote:
> > > I assume that the first few lines of the
> > > coyote._home_gene_Pictures.0 file is an Amana header (including an
> > > XML chunk); can you post that here?
> >
> > Hmmm, it might also be useful to see the header from the
> > coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as
> > well
> >
> Try this:
> gene@coyote:sudo dd 
> if=/sdb/dumps/20201130020105/coyote._home_gene_Pictures.0.5 bs=32k count=1
> 
> AMANDA: CONT_FILE 20201130020105 coyote /home/gene/Pictures  
> lev 0 comp N program APPLICATION
> APPLICATION=amgtar
> DLE=

Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway
On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote:
> I assume that the first few lines of the coyote._home_gene_Pictures.0
> file is an Amana header (including an XML chunk); can you post that
> here?


Hmmm, it might also be useful to see the header from the
coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as
well

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway
On Mon, Nov 30, 2020 at 03:12:41 -0500, Gene Heskett wrote:
> Doing a level 0 on /home/gene/Pictures, it logged this in the email:
> 
>   coyote /home/gene/Pictures lev 0  partial taper: source server crc 
> (44cff778:11146117120) and input server crc (dfd0e83a:11146117120) 
> differ)
>   coyote /home/gene/Pictures lev 0  was successfully retried
> 
> It did leave a 10+ Gb file in the vtape, but left the failed files in the 
> holding disk:
> 
> root@coyote:~$ ls -l /sdb/dumps/20201130020105/
> total 10885096
> -rw--- 1 amanda amanda 2097152000 Nov 30 02:06  
> coyote._home_gene_Pictures.0

Would /home/gene/Pictures have changed any between the two retries?  If
not, you might learn something by comparing the components of the
successful dump with the files on the holding disk... (but off hand I'm
not sure how many red-herring differences you'd have sort through to
find any hints as to the actual problem).


> 
> Does anyone have a clue what its really trying to tell me?

I only have some vague clues:

* the number after the ":" is the size of the file being CRCed.  In this
  case 11146117120 shows up for both sides of the commparison, so it
  seems like the full file got transfered across to whatever step is
  causing the error.  It also seems like this error applies to the
  entire 11GB dump rather than the individual 2GB parts.

* The message "source server crc ([...]) and input server crc" appears 
  to be generated in Amanda/Taper/Worker.pm:result_cb() in cases where
  $self->{'server_crc'} and $self->{'source_server_crc'} differ.

  $self->{'server_crc'} seems to be read out of the header of the dump
  file itself.

  $self->{'source_server_crc'} seems to be computed as part of
  transfering the file to the taper process, or something like that.

  


So I guess the next question is where in the multiple stages of the life
of the dump file  the CRC missmatch gets introduced

I assume that the first few lines of the coyote._home_gene_Pictures.0
file is an Amana header (including an XML chunk); can you post that
here?

Also, what do you find when you grep the Amanda debug/log files for
those two CRC values ( 44cff778 and dfd0e83a )?


One other thought: have the reported CRC errors in the past also been
for the dump of the /home/gene/Pictures DLE, or are multiple different
DLEs affected?  Is it always level 0 dumps?

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239