Re: [OpenAFS] openafs-server does not recover from crash

2022-03-10 Thread Pascal Salet

Am 09.03.22 um 19:09 schrieb Mark Vitale:

Pascal,


On 9 Mar 2022, at 11:26 AM, Pascal Salet  wrote:

our openafs-server has stopped working after a crash.

"bos status" shows all services online for all fileservers and DBservers.

udebug port 7003 works correctly from all fileservers and DBservers.

However, "strace -p $(pidof salvageserver)" shows an error:
connect(6, {sa_family=AF_UNIX, sun_path="/var/lib/openafs/local/fssync.sock"}, 
110) = -1 ECONNREFUSED (Connection refused)

SalsrvLog:
Wed Mar 09 16:09:31 2022 @(#)OpenAFS 1.8.2-1-debian 2018-09-12


Unfortunately this version of OpenAFS has the Rx CID bug:
   http://openafs.org/frameset/dl/openafs/1.8.7/RELNOTES-1.8.7

This bug may only become apparent on the first restart after Jan 15, 2021.
You must upgrade all your clients and servers to OpenAFS 1.8.7 or higher.

Regards,
--
Mark Vitale
[email protected]



Mark, thank you very much for your advice!

Upgrading to OpenAFS 1.8.6-5 (Debian) solved the problem.

Pascal

--
Pascal Salet
IT-Services / Server Infrastructure
Wirtschaftsuniversität Wien / Vienna University of Economics and 
Business / Austria

[email protected] / +43-676-8213-5375
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] openafs-server does not recover from crash

2022-03-09 Thread Mark Vitale
Pascal,

> On 9 Mar 2022, at 11:26 AM, Pascal Salet  wrote:
> 
> our openafs-server has stopped working after a crash.
> 
> "bos status" shows all services online for all fileservers and DBservers.
> 
> udebug port 7003 works correctly from all fileservers and DBservers.
> 
> However, "strace -p $(pidof salvageserver)" shows an error:
> connect(6, {sa_family=AF_UNIX, 
> sun_path="/var/lib/openafs/local/fssync.sock"}, 110) = -1 ECONNREFUSED 
> (Connection refused)
> 
> SalsrvLog:
> Wed Mar 09 16:09:31 2022 @(#)OpenAFS 1.8.2-1-debian 2018-09-12

Unfortunately this version of OpenAFS has the Rx CID bug: 
  http://openafs.org/frameset/dl/openafs/1.8.7/RELNOTES-1.8.7

This bug may only become apparent on the first restart after Jan 15, 2021.
You must upgrade all your clients and servers to OpenAFS 1.8.7 or higher.

Regards,
--
Mark Vitale
[email protected]



___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] openafs-server does not recover from crash

2022-03-09 Thread Pascal Salet

Hi,

our openafs-server has stopped working after a crash.

"bos status" shows all services online for all fileservers and DBservers.

udebug port 7003 works correctly from all fileservers and DBservers.

However, "strace -p $(pidof salvageserver)" shows an error:
connect(6, {sa_family=AF_UNIX, 
sun_path="/var/lib/openafs/local/fssync.sock"}, 110) = -1 ECONNREFUSED 
(Connection refused)


SalsrvLog:
Wed Mar 09 16:09:31 2022 @(#)OpenAFS 1.8.2-1-debian 2018-09-12
Wed Mar 09 16:09:31 2022 Starting OpenAFS Online Salvage Server 2.4 
(/usr/lib/openafs/salvageserver)
Wed Mar 09 16:10:57 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)
Wed Mar 09 16:11:29 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)
Wed Mar 09 16:12:09 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)

SYNC_connect failed (giving up!): Connection refused
Wed Mar 09 16:12:57 2022 Unable to connect to file server; aborted

FileLog:
Wed Mar 09 15:57:41 2022 VL_RegisterAddrs rpc failed; will retry 
periodically (code=-1, err=0)
Wed Mar 09 16:03:31 2022 Couldn't get CPS for AnyUser, will try again in 
30 seconds; code=-1.
Wed Mar 09 16:06:56 2022 Couldn't get CPS for AnyUser, will try again in 
30 seconds; code=-1.
Wed Mar 09 16:10:21 2022 Couldn't get CPS for AnyUser, will try again in 
30 seconds; code=-1.
Wed Mar 09 16:13:46 2022 Couldn't get CPS for AnyUser, will try again in 
30 seconds; code=-1.


Boslog:
Wed Mar  9 16:06:05 2022: dafs started pid 5912: 
/usr/lib/openafs/salvageserver

Wed Mar  9 16:09:31 2022: dafs:salsrv exited with code 1
Wed Mar  9 16:09:31 2022: dafs started pid 6757: 
/usr/lib/openafs/salvageserver

Wed Mar  9 16:12:57 2022: dafs:salsrv exited with code 1
Wed Mar  9 16:12:57 2022: dafs started pid 7635: 
/usr/lib/openafs/salvageserver

Wed Mar  9 16:16:23 2022: dafs:salsrv exited with code 1

VolserLog:
Wed Mar 09 15:50:22 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)
Wed Mar 09 15:50:54 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)
Wed Mar 09 15:51:34 2022 SYNC_connect: temporary failure on circuit 
'FSSYNC' (will retry)

SYNC_connect failed (giving up!): Connection refused
Wed Mar 09 15:52:21 2022 Unable to connect to file server; will retry at 
need


I would be very grateful for any advice on this matter.

Pascal

--
Pascal Salet
IT-Services / Server Infrastructure
Wirtschaftsuniversität Wien / Vienna University of Economics and 
Business / Austria

[email protected] / +43-676-8213-5375
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info