Hi Mark Vitale,

Thank you for you fast reply.
I understand that mount point kept in volume its self not in vldb.
I tried stop and start vldb server before replace vldb.DB0 from backup but it didn't help.
Somehow, the issue resolve by replace vldb from backup.

>What version of AFS are you using for your vlservers, fileservers, and cache managers (clients)? And what operating system and version do your clients run on?

I'm using Ubuntu 16.04 x64 both on server and client.

Linux 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
openafs-client 1.6.18-1ubuntu1
openafs-fileserver 1.6.18-1ubuntu1
openafs-krb5 1.6.18-1ubuntu1


>Are you running vldb_check against a live VLDB?
check live vldb which is point to wrong mount point :

$ vldb_check -database /var/lib/openafs/db/vldb.DB0'

address 0xhhhhhhh  root.cell (xxxxxxxxxx) has no RW volume

but when check backup vldb from 12hr before showing no error.

$ vldb_check -database /backup/openafs/live/db/vldb.DB0.yyyymmdd-hhmm

result no issue


>Do the VLDB entries for the apparently corrupted volumes change frequently?

No, we have 200-300 volume create and release daily. From our event log it happen 2 time in the past 3 years.

>Are you taking any steps to ensure the VLDB is not changing when you back it up?

Oh, I'm not preventing this. Just copy live vldb from syncsite /var/lib/openafs/db/vldb.DB0 and send it to non-afs backup server.
In the emergency recovery test, Use the backup from this method always work.

>Could you provide more details about the steps you take to recover your VLDB?

Stop openafs-fileserver service on 3 of vldb server by
$ service openafs-fileserver stop

Remove vldb.DB0 from /var/lib/openafs/db/
$ rm /var/lib/openafs/db/vldb.DB0

Copy backup vldb.DB0 from non-afs backup server to vldb server
Repeat this step to 3 of vldb server
$ scp backupserver:/backup/openafs/live/db/vldb.DB0.yyyymmdd-hhmm /var/lib/openafs/db/

Start openafs-fileserver service on 3 of vldb server by
$ service openafs-fileserver start

Wait 2-3 minute and check for syncsite voted.
Then everything back to normal at this point.
We ran syncserv and syncvldb to update the change of actual volume on each file server.
Then all volume on 13 servers update with vldb.

At this point my concern is what may cause this to happen? So I can look for ways to prevent it.


Best regards,

Pommm

On 2/21/19 9:34 PM, Mark Vitale wrote:
Pomm,

Thank you for your report.  Could you provide some more details (inline below)?

On Feb 21, 2019, at 4:58 AM, Thossaporn (Pommm) Phetruphant<[email protected]>  
wrote:

I have 3 vldb/pts servers and 13 file servers in my network. All are on the 
same subnet, same location.
We have encountered 2nd time of corrupted VLDB where when 'cd' into a mount 
point it go difference volume.

Example:
live.D1 mount at /afs/domain/live/data1
live.D2 mount at /afs/domain/live/data2
root.cell is at /afs/domain


cd /afs/domain/live/data1

'fs exa . ' show volume named 'live.D2' mounted at this mount point

'ls' show data in data2

or

cd /afs/domain

'fs exa . ' show volume named 'live.D1' mounted at this mount point

'ls' show data in data1
Mount point information is stored in the fileserver vice partitions, not in the 
VLDB.
What version of AFS are you using for your vlservers, fileservers, and cache 
managers (clients)?
And what operating system and version do your clients run on?

<snip>

'vldb_check -database /var/lib/openafs/db/vldb.DB0'  show 'root.cell 
(xxxxxxxxxx) has no RW volume'  and ~10 volumes also 'has no RW volume'

So, I have backup of VLDB hourly, so it can be recovered fast enough but it is 
2nd time that this happen.
Is anyone known why this would happen?  How can we prevent it?
Are you running vldb_check against a live VLDB?
Do the VLDB entries for the apparently corrupted volumes change frequently?
Are you taking any steps to ensure the VLDB is not changing when you back it up?
Could you provide more details about the steps you take to recover your VLDB?

Regards,
--
Mark Vitale
[email protected]



_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to