Re: DLE stuck on holdingdisk and being flushed on every dump

2017-11-02 Thread Jose M Calhariz
On Wed, Nov 01, 2017 at 05:03:58PM -0400, Jean-Louis Martineau wrote:
> Jose,
> 
> The report should report the error.
> The attached patch fix it.

The patch applied cleanly and I am now compiling it.

Kind regards
Jose M Calhariz

> 
> Jean-Louis
> 
> On 01/11/17 12:18 PM, Jose M Calhariz wrote:
> > On Wed, Nov 01, 2017 at 10:47:10AM -0400, Jean-Louis Martineau wrote:
> > > Jose,
> > > 
> > > When amanda did the backup, it computed a crc of 6828c030.
> > > But it compute b8d66d3d when it read it from holding disk to flush it.
> > > Do you see that error in the report for all flush attempt?
> > Yes, I see.
> > 
> > 
> > > The holding disk file is not erase because of the crc mismatch.
> > > You can manually remove it (amadmin holding list, amadmin holding
> > > delete) from the holding disk and force a full backup of that dle.
> > > 
> > > That's why a added the crc, these kind of errors where undetected
> > > before.
> > I hove only seen this error, because after an amdump I do:
> > 
> > /usr/bin/time /usr/sbin/amcheckdump ${CONF}
> > ls -alF /backup/amanda/${CONF}/data/ | tail
> > 
> > I have not seen any error on the report generated by amdump.  Should
> > not be there a notice about this type of errors?
> > 
> > 
> > > Jean-Louis
> > 
> > Kind reagards
> > Jose M Calhariz
> > 
> > 
> > > On 01/11/17 10:24 AM, Jose M Calhariz wrote:
> > > > Hi
> > > > 
> > > > I am using amanda 3.5 on my personal server and started to investigate
> > > > a strange problem on my amanda installation.
> > > > 
> > > > Since around 10 days ago when it runs the amdump during the nigth, it
> > > > flush the holdingdisk, by starting allways with the same big DLE.
> > > > This means after the DLE is on vTape is not deleted from the
> > > > holdingdisk and will be flushed on the next night.
> > > > 
> > > > I found an interesting error message in amdump.20171101031517, I think
> > > > this error is related to the problematic DLE.  And I noticed that the
> > > > file on vTape is broken too, it generates an error from gzip.
> > > > 
> > > > driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1 
> > > > INPUT-ERROR TAPE-GOOD "6828c030:54131462843" "[sec 548.00 bytes 
> > > > 54131462843 kps 96464.883212 orig-kb 53706460]" "source server crc 
> > > > (6828c030:54131462843) and input server crc (b8d66d3d:54131462843) 
> > > > differ)" ""
> > > > 
> > > > This problem possibly was caused by an error on SATA that caused the
> > > > system disk to became read-only.
> > > > 
> > > > What logs should I investigate?  Because this is a personal
> > > > installation with private data, I will send the logs to the interested
> > > > persons, not to the list.
> > > > 
> > > > 
> > > > Kind regards
> > > > Jose M Calhariz
> > > > 
> > > This message is the property of CARBONITE, INC. and may contain 
> > > confidential or privileged information.
> > > If this message has been delivered to you by mistake, then do not copy or 
> > > deliver this message to anyone.  Instead, destroy it and notify me by 
> > > reply e-mail
> 
> 

> diff --git a/perl/Amanda/Report/human.pm b/perl/Amanda/Report/human.pm
> index 0b80219..e6a700d 100644
> --- a/perl/Amanda/Report/human.pm
> +++ b/perl/Amanda/Report/human.pm
> @@ -667,7 +667,9 @@ sub output_error_summaries
>   push @dump_failures, "$hostname $qdisk lev 
> $try->{chunker}->{level}  FAILED [$try->{chunker}->{error}]";
>   $failed = 1;
>   }
> - if (   exists $try->{taper} && exists $try->{dumper} && !exists 
> $dle->{driver}
> + if (   exists $try->{taper}
> + && ((exists $try->{dumper} && !exists $dle->{driver})
> + || (!exists $try->{dumper} && !exists $dle->{driver}))
>   && (   $try->{taper}->{status} eq 'fail'
>   || (   $try->{taper}->{status} eq 'partial'))) {
>   my $flush = "FLUSH";


-- 
--

A vaidade de ser tido como alguém que guarda segredos é geralmente um dos 
principais motivos para revelá-los

--Samuel Johnson


Re: DLE stuck on holdingdisk and being flushed on every dump

2017-11-01 Thread Jean-Louis Martineau

Jose,

The report should report the error.
The attached patch fix it.

Jean-Louis

On 01/11/17 12:18 PM, Jose M Calhariz wrote:

On Wed, Nov 01, 2017 at 10:47:10AM -0400, Jean-Louis Martineau wrote:

Jose,

When amanda did the backup, it computed a crc of 6828c030.
But it compute b8d66d3d when it read it from holding disk to flush it.
Do you see that error in the report for all flush attempt?

Yes, I see.



The holding disk file is not erase because of the crc mismatch.
You can manually remove it (amadmin holding list, amadmin holding
delete) from the holding disk and force a full backup of that dle.

That's why a added the crc, these kind of errors where undetected
before.

I hove only seen this error, because after an amdump I do:

/usr/bin/time /usr/sbin/amcheckdump ${CONF}
ls -alF /backup/amanda/${CONF}/data/ | tail

I have not seen any error on the report generated by amdump.  Should
not be there a notice about this type of errors?



Jean-Louis


Kind reagards
Jose M Calhariz



On 01/11/17 10:24 AM, Jose M Calhariz wrote:

Hi

I am using amanda 3.5 on my personal server and started to investigate
a strange problem on my amanda installation.

Since around 10 days ago when it runs the amdump during the nigth, it
flush the holdingdisk, by starting allways with the same big DLE.
This means after the DLE is on vTape is not deleted from the
holdingdisk and will be flushed on the next night.

I found an interesting error message in amdump.20171101031517, I think
this error is related to the problematic DLE.  And I noticed that the
file on vTape is broken too, it generates an error from gzip.

driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1 INPUT-ERROR TAPE-GOOD 
"6828c030:54131462843" "[sec 548.00 bytes 54131462843 kps 96464.883212 orig-kb 53706460]" 
"source server crc (6828c030:54131462843) and input server crc (b8d66d3d:54131462843) differ)" ""

This problem possibly was caused by an error on SATA that caused the
system disk to became read-only.

What logs should I investigate?  Because this is a personal
installation with private data, I will send the logs to the interested
persons, not to the list.


Kind regards
Jose M Calhariz


This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail



diff --git a/perl/Amanda/Report/human.pm b/perl/Amanda/Report/human.pm
index 0b80219..e6a700d 100644
--- a/perl/Amanda/Report/human.pm
+++ b/perl/Amanda/Report/human.pm
@@ -667,7 +667,9 @@ sub output_error_summaries
 		push @dump_failures, "$hostname $qdisk lev $try->{chunker}->{level}  FAILED [$try->{chunker}->{error}]";
 		$failed = 1;
 		}
-		if (   exists $try->{taper} && exists $try->{dumper} && !exists $dle->{driver}
+		if (   exists $try->{taper}
+		&& ((exists $try->{dumper} && !exists $dle->{driver})
+			|| (!exists $try->{dumper} && !exists $dle->{driver}))
 		&& (   $try->{taper}->{status} eq 'fail'
 			|| (   $try->{taper}->{status} eq 'partial'))) {
 		my $flush = "FLUSH";


Re: DLE stuck on holdingdisk and being flushed on every dump

2017-11-01 Thread Jose M Calhariz
On Wed, Nov 01, 2017 at 10:47:10AM -0400, Jean-Louis Martineau wrote:
> Jose,
> 
> When amanda did the backup, it computed a crc of 6828c030.
> But it compute b8d66d3d when it read it from holding disk to flush it.
> Do you see that error in the report for all flush attempt?

Yes, I see.


> 
> The holding disk file is not erase because of the crc mismatch.
> You can manually remove it (amadmin holding list, amadmin holding 
> delete) from the holding disk and force a full backup of that dle.
> 
> That's why a added the crc, these kind of errors where undetected
> before.

I hove only seen this error, because after an amdump I do:

/usr/bin/time /usr/sbin/amcheckdump ${CONF}
ls -alF /backup/amanda/${CONF}/data/ | tail

I have not seen any error on the report generated by amdump.  Should
not be there a notice about this type of errors?


> 
> Jean-Louis


Kind reagards
Jose M Calhariz


> 
> On 01/11/17 10:24 AM, Jose M Calhariz wrote:
> > Hi
> >
> > I am using amanda 3.5 on my personal server and started to investigate
> > a strange problem on my amanda installation.
> >
> > Since around 10 days ago when it runs the amdump during the nigth, it
> > flush the holdingdisk, by starting allways with the same big DLE.
> > This means after the DLE is on vTape is not deleted from the
> > holdingdisk and will be flushed on the next night.
> >
> > I found an interesting error message in amdump.20171101031517, I think
> > this error is related to the problematic DLE.  And I noticed that the
> > file on vTape is broken too, it generates an error from gzip.
> >
> > driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1 
> > INPUT-ERROR TAPE-GOOD "6828c030:54131462843" "[sec 548.00 bytes 
> > 54131462843 kps 96464.883212 orig-kb 53706460]" "source server crc 
> > (6828c030:54131462843) and input server crc (b8d66d3d:54131462843) differ)" 
> > ""
> >
> > This problem possibly was caused by an error on SATA that caused the
> > system disk to became read-only.
> >
> > What logs should I investigate?  Because this is a personal
> > installation with private data, I will send the logs to the interested
> > persons, not to the list.
> >
> >
> > Kind regards
> > Jose M Calhariz
> >
> This message is the property of CARBONITE, INC. and may contain confidential 
> or privileged information.
> If this message has been delivered to you by mistake, then do not copy or 
> deliver this message to anyone.  Instead, destroy it and notify me by reply 
> e-mail

-- 
--
O grande prazer proporcionado por um cachorro e o de que
você pode se passar por um idiota na frente dele e ele não
apenas não rira de você, como se passara também por
idiota.
-- Samuel Butler


Re: DLE stuck on holdingdisk and being flushed on every dump

2017-11-01 Thread Gene Heskett
On Wednesday 01 November 2017 10:24:00 Jose M Calhariz wrote:

> Hi
>
> I am using amanda 3.5 on my personal server and started to investigate
> a strange problem on my amanda installation.
>
> Since around 10 days ago when it runs the amdump during the nigth, it
> flush the holdingdisk, by starting allways with the same big DLE.
> This means after the DLE is on vTape is not deleted from the
> holdingdisk and will be flushed on the next night.
>
> I found an interesting error message in amdump.20171101031517, I think
> this error is related to the problematic DLE.  And I noticed that the
> file on vTape is broken too, it generates an error from gzip.
>
> driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1
> INPUT-ERROR TAPE-GOOD "6828c030:54131462843" "[sec 548.00 bytes
> 54131462843 kps 96464.883212 orig-kb 53706460]" "source server crc
> (6828c030:54131462843) and input server crc (b8d66d3d:54131462843)
> differ)" ""
>
> This problem possibly was caused by an error on SATA that caused the
> system disk to became read-only.
>
Is the sata cable bright red plastic?

Replace it. The red cables in my experience since about 1970, when the 
J.A. Pan company started using it in CB radio mic cables, have a 
liketime of 3 to 4 years. Something in the red dye attacks the copper 
wire, turning it into a brownish powder. Acid test, take a pencil, and 
move the cable 1/2 an inch while tailing the system log.  If the log 
blows up with drive resets, bad cable. Replace it with any OTHER color 
of cable.

Also, go to the drive makers site, and get that drives latest firmware, 
put it on a cd as an image, not as a file, reboot to the cd, it should 
find the drive and update it.

The drive will be faster, sometimes dramaticly, like from a write speed 
of 22 megs/second, to about 135 megs/second. I have been using a 1TB 
Seagate for vtapes and when I did the firmware update, it was only 2 
weeks old and had already marked 25 bad sectors out. Most of a decade 
later, its still running with the same re-allocated sector count, but 
now has 70,000 spinning hours on it.

But I'm about to replace it as I've outgrown it, and have put off adding 
the last 2 machines I've added to my home network to the disklist as df 
reports its hovering in the 87% range and I got a disk full error from 
one big DLE 3 nights back. The drive is still a 100% usable drive. If it 
weren't for SSD's getting to be dirt cheap, like $33 for a 64GB thats 3x 
faster than spinning rust, which I've already put in two of my milling 
machines and intend to do to the rest of my machinery as time permits.  
For that duty, a 64GB SSD is only 28% used. gcode to carve a part is not 
a huge file, so it will handily outlast me since I'm already 83.

> What logs should I investigate?  Because this is a personal
> installation with private data, I will send the logs to the interested
> persons, not to the list.
>
>
> Kind regards
> Jose M Calhariz


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 


Re: DLE stuck on holdingdisk and being flushed on every dump

2017-11-01 Thread Jean-Louis Martineau
Jose,

When amanda did the backup, it computed a crc of 6828c030.
But it compute b8d66d3d when it read it from holding disk to flush it.
Do you see that error in the report for all flush attempt?

The holding disk file is not erase because of the crc mismatch.
You can manually remove it (amadmin holding list, amadmin holding 
delete) from the holding disk and force a full backup of that dle.

That's why a added the crc, these kind of errors where undetected before.

Jean-Louis

On 01/11/17 10:24 AM, Jose M Calhariz wrote:
> Hi
>
> I am using amanda 3.5 on my personal server and started to investigate
> a strange problem on my amanda installation.
>
> Since around 10 days ago when it runs the amdump during the nigth, it
> flush the holdingdisk, by starting allways with the same big DLE.
> This means after the DLE is on vTape is not deleted from the
> holdingdisk and will be flushed on the next night.
>
> I found an interesting error message in amdump.20171101031517, I think
> this error is related to the problematic DLE.  And I noticed that the
> file on vTape is broken too, it generates an error from gzip.
>
> driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1 
> INPUT-ERROR TAPE-GOOD "6828c030:54131462843" "[sec 548.00 bytes 
> 54131462843 kps 96464.883212 orig-kb 53706460]" "source server crc 
> (6828c030:54131462843) and input server crc (b8d66d3d:54131462843) differ)" ""
>
> This problem possibly was caused by an error on SATA that caused the
> system disk to became read-only.
>
> What logs should I investigate?  Because this is a personal
> installation with private data, I will send the logs to the interested
> persons, not to the list.
>
>
> Kind regards
> Jose M Calhariz
>
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


DLE stuck on holdingdisk and being flushed on every dump

2017-11-01 Thread Jose M Calhariz

Hi

I am using amanda 3.5 on my personal server and started to investigate
a strange problem on my amanda installation.

Since around 10 days ago when it runs the amdump during the nigth, it
flush the holdingdisk, by starting allways with the same big DLE.
This means after the DLE is on vTape is not deleted from the
holdingdisk and will be flushed on the next night.

I found an interesting error message in amdump.20171101031517, I think
this error is related to the problematic DLE.  And I noticed that the
file on vTape is broken too, it generates an error from gzip.

driver: result time 549.166 from taper0: PARTIAL worker0-0 00-1 INPUT-ERROR 
TAPE-GOOD "6828c030:54131462843" "[sec 548.00 bytes 54131462843 kps 
96464.883212 orig-kb 53706460]" "source server crc (6828c030:54131462843) and 
input server crc (b8d66d3d:54131462843) differ)" ""

This problem possibly was caused by an error on SATA that caused the
system disk to became read-only.

What logs should I investigate?  Because this is a personal
installation with private data, I will send the logs to the interested
persons, not to the list.


Kind regards
Jose M Calhariz

-- 
--

Os três melhores sons do mundo são: a voz do ser amado, o borbulhar da água no 
deserto e o ruído de uma moeda de ouro batendo em outra

--Provérbio árabe