Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner

2023-07-27 Thread Lou Hafer

Jose,

Indeed I would be interested! Certainly worth a try.

 Thanks,
  Lou


On 2023-07-25 9:34 a.m., Jose M Calhariz wrote:

Hi,

If I understand well your problem I found it in 3.5.1 and I have a
patch that fix it, from the previous owner of amanda.  The patch is
being in used by amanda in Debian for several years.

I can publish the patch here if you are interested.

Kind regards
Jose M Calhariz


On Wed, Jul 19, 2023 at 09:34:21AM -0700, Lou Hafer wrote:

Nuno,

 Thanks for the reply! And apologies for being not quite clear.

 I'm quite sure the offending hosts are powered down, so no chance of
partial response. When I look at the planner..debug log, I can
see sendsize requests going out to the hosts that are powered up and
responsive, and I can see their responses arrive. There are two hosts
powered down, gallifrey and jpt. The requests go out to gallifrey, then jpt.
When the request to gallifrey times out, planner sees 255 status from SSH
and aborts with the EOF error. Doesn't even wait around for the timeout on
jpt.

 If I go back and look at some old logs, I can see planner continue past
the `EOF on read' error. So I'm really starting to think this is a new bug
in 3.5.3.

 For what it's worth, I'd interpret your error,

ERROR Request to MACHINE failed: Connection refused

as the machine was powered up and responsive but actively refused the
connection for some reason.

 I'm puzzled by another thing: we're using the same version of amanda
(3.5.3) and I run backups to disk, no tape drive involved, but I've never
seen the error you mention in April: backup aborts after first machine/disk
in disklist. The obvious difference is Fedora 37 versus Fedora 38, but
really that shouldn't cause this much difference in behaviour.

 Bah! Sometimes staying up-to-date is a bit painful. I'll see if anyone
else chimes in before I report this as a bug.

 Spent two weeks in Porto and the Douro Valley in Fall 2022. Loved the
country!

 Lou


On 2023-07-19 2:17 a.m., Nuno Dias wrote:

   Hi Lou,

   I'm using the same version as you, although in Fedora 37
amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some
machines that are down and the rest of the backups were made.

   In my case I have this

planner: ERROR Request to MACHINE failed: Connection refused

   From what you wrote, it seems gallifrey.ivriel is not down is
responding, but has some problems reporting the size.

   Maybe this page will help

https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/

   Although if is aborting all the planner it seems a bug, or there are
other reasons for aborting all the planner, maybe checking if the
etimeout  is not very low.

Cheers,
Nuno

On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote:

Folks,

   I've been using amanda for several years on a simple home
network.
Hosts are often powered down. Up through amanda 3.5.2, this worked
like
a charm. If the host didn't respond, it was simply skipped. Hosts
that
responded were properly backed up.

   With amanda 3.5.3, the behaviour has changed. If a host doesn't
respond to the planner size request, the planner aborts the entire
backup with the error

     planner: ERROR Request to gallifrey.ivriel failed:
     EOF on read from gallifrey.ivriel

I've confirmed that my configuration is generally correct --- as long
as
all hosts in the disklist respond to the size request, the backup
succeeds.

Is this a bug? Do I need to change some parameter in my configuration
to
persuade planner to soldier on? Any thoughts would be appreciated.

   As context, this problem came about with an upgrade from Fedora
37
to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda
3.5.3.

   Thanks,
   Lou












Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner

2023-07-19 Thread Lou Hafer

Nuno,

Thanks for the reply! And apologies for being not quite clear.

I'm quite sure the offending hosts are powered down, so no chance 
of partial response. When I look at the planner..debug log, I 
can see sendsize requests going out to the hosts that are powered up and 
responsive, and I can see their responses arrive. There are two hosts 
powered down, gallifrey and jpt. The requests go out to gallifrey, then 
jpt. When the request to gallifrey times out, planner sees 255 status 
from SSH and aborts with the EOF error. Doesn't even wait around for the 
timeout on jpt.


If I go back and look at some old logs, I can see planner continue 
past the `EOF on read' error. So I'm really starting to think this is a 
new bug in 3.5.3.


For what it's worth, I'd interpret your error,

ERROR Request to MACHINE failed: Connection refused

as the machine was powered up and responsive but actively refused the 
connection for some reason.


I'm puzzled by another thing: we're using the same version of 
amanda (3.5.3) and I run backups to disk, no tape drive involved, but 
I've never seen the error you mention in April: backup aborts after 
first machine/disk in disklist. The obvious difference is Fedora 37 
versus Fedora 38, but really that shouldn't cause this much difference 
in behaviour.


Bah! Sometimes staying up-to-date is a bit painful. I'll see if 
anyone else chimes in before I report this as a bug.


Spent two weeks in Porto and the Douro Valley in Fall 2022. Loved 
the country!


Lou


On 2023-07-19 2:17 a.m., Nuno Dias wrote:

  Hi Lou,

  I'm using the same version as you, although in Fedora 37
amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some
machines that are down and the rest of the backups were made.

  In my case I have this

   planner: ERROR Request to MACHINE failed: Connection refused

  From what you wrote, it seems gallifrey.ivriel is not down is
responding, but has some problems reporting the size.

  Maybe this page will help

https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/

  Although if is aborting all the planner it seems a bug, or there are
other reasons for aborting all the planner, maybe checking if the
etimeout  is not very low.

Cheers,
Nuno

On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote:

Folks,

  I've been using amanda for several years on a simple home
network.
Hosts are often powered down. Up through amanda 3.5.2, this worked
like
a charm. If the host didn't respond, it was simply skipped. Hosts
that
responded were properly backed up.

  With amanda 3.5.3, the behaviour has changed. If a host doesn't
respond to the planner size request, the planner aborts the entire
backup with the error

    planner: ERROR Request to gallifrey.ivriel failed:
    EOF on read from gallifrey.ivriel

I've confirmed that my configuration is generally correct --- as long
as
all hosts in the disklist respond to the size request, the backup
succeeds.

Is this a bug? Do I need to change some parameter in my configuration
to
persuade planner to soldier on? Any thoughts would be appreciated.

  As context, this problem came about with an upgrade from Fedora
37
to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda
3.5.3.

  Thanks,
  Lou







Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner

2023-07-19 Thread Nuno Dias
 Hi Lou,

 I'm using the same version as you, although in Fedora 37 
amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some
machines that are down and the rest of the backups were made.

 In my case I have this

  planner: ERROR Request to MACHINE failed: Connection refused

 From what you wrote, it seems gallifrey.ivriel is not down is
responding, but has some problems reporting the size.

 Maybe this page will help

https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/

 Although if is aborting all the planner it seems a bug, or there are
other reasons for aborting all the planner, maybe checking if the
etimeout  is not very low.

Cheers,
Nuno 

On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote:
> Folks,
> 
>  I've been using amanda for several years on a simple home
> network. 
> Hosts are often powered down. Up through amanda 3.5.2, this worked
> like 
> a charm. If the host didn't respond, it was simply skipped. Hosts
> that 
> responded were properly backed up.
> 
>  With amanda 3.5.3, the behaviour has changed. If a host doesn't 
> respond to the planner size request, the planner aborts the entire 
> backup with the error
> 
>    planner: ERROR Request to gallifrey.ivriel failed:
>    EOF on read from gallifrey.ivriel
> 
> I've confirmed that my configuration is generally correct --- as long
> as 
> all hosts in the disklist respond to the size request, the backup
> succeeds.
> 
> Is this a bug? Do I need to change some parameter in my configuration
> to 
> persuade planner to soldier on? Any thoughts would be appreciated.
> 
>  As context, this problem came about with an upgrade from Fedora
> 37 
> to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda
> 3.5.3.
> 
>  Thanks,
>  Lou
> 

-- 
Nuno Dias 
LIP



Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner

2023-07-18 Thread Lou Hafer

Folks,

I've been using amanda for several years on a simple home network. 
Hosts are often powered down. Up through amanda 3.5.2, this worked like 
a charm. If the host didn't respond, it was simply skipped. Hosts that 
responded were properly backed up.


With amanda 3.5.3, the behaviour has changed. If a host doesn't 
respond to the planner size request, the planner aborts the entire 
backup with the error


  planner: ERROR Request to gallifrey.ivriel failed:
  EOF on read from gallifrey.ivriel

I've confirmed that my configuration is generally correct --- as long as 
all hosts in the disklist respond to the size request, the backup succeeds.


Is this a bug? Do I need to change some parameter in my configuration to 
persuade planner to soldier on? Any thoughts would be appreciated.


As context, this problem came about with an upgrade from Fedora 37 
to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda 3.5.3.


Thanks,
Lou