Hi William : If your VLDB is still in this state, you can create a dummy cell on a client machine and then use the
vos syncvldb <fileserver in real cell > -cell <dummy cell>
to rebuild the VLDB for the main cell.
Once you hit all of the fileservers, you can then place
this VLDB from the dummy cell onto the DB server machines
for your real cell.
The VLDB DataBase does not have any cellname specific
information in it, so creating it on a dummy cell is not
a problem.
As long as you are not making changes to the VLDB while
you are rebuilding the VLDB, this will work.
=====
=====
This is what I did on an AFS client machine only.
- Created /usr/afs/[etc,db,local] directories
- Created /usr/afs/etc/CellServDB and ThisCell files
gyro# more CellServDB
>dummycell.com #Cell name
158.43.11.175 #gyro.dummycell.com
gyro# more ThisCell
dummycell.com
- Added the cell to the client CellServDB file and ran
fs newcell
fs newcell dummycell.com gyro.dummycell.com
** use your client name for the DB server entry
- Placed the new cell in NoAuth mode
touch /usr/afs/local/NoAuth
- Started the vlserver on this machine
/usr/afs/bin/vlserver &
- Got tokens in the main cell that I wanted to duplicate the
VLDB for
- Ran the "vos syncvldb" command.
/usr/afs/bin/vos syncvldb pork -cell dummycell.com -noauth -verb
My test cell is dummycell.com and pork is a fileserver in the main
cell. This command created the new entries in the VLDB for
the dummycell.com cell.
Now running this against your fileservers will probably take some time
because of the number of volumes in your cell. But as long as no VLDB
updates are going on while your building this new VLDB, it will be
current with your site when it is finished.
- Then you can get ready to place this VLDB onto your main
Database server machines.
You should stop the vlservers on your main DB servers.
Save a copy of your current, corrupted VLDB
Copy the new VLDB into the /usr/afs/db dir
Restart the vlservers
And the new VLDB should be OK.
If you get everything in place before you stop the
vlservers, you should be able to stop the vlservers, copy
the new VLDB and restart the vlservers before anything
times out ! So no downtime.
Thanks
Todd DeSantis
William Setzer
<[EMAIL PROTECTED]
csu.edu> To
Sent by: [email protected]
openafs-info-admi cc
[EMAIL PROTECTED]
Subject
[OpenAFS] VLDB problem - Duplicate
09/12/2008 04:50 entries
PM
Please respond to
[EMAIL PROTECTED]
su.edu
[ If you see this twice, I apologize. I sent it to an old address
without noticing, so I hope it got eaten. ]
We've been investigating why our "vos backupsys" processes have been
hanging, and have discovered something disturbing. Upon dumping out
our VLDB via "vos listvldb > foo" it appears our VLDB has been
corrupted. We're seeing two entries for a significant percentage
(1/4) of our volumes:
adm.db
RWrite: 536899559 Backup: 536899561
number of sites -> 1
server A.ncsu.edu partition /vicepa RW Site
adm.db
RWrite: 536899559 Backup: 536899561
number of sites -> 1
server A.ncsu.edu partition /vicepa RW Site
Right now, it's never more than two instances per volume, and
sometimes they point to the same server, sometimes they point to
different servers.
Our first thought is to do a "vos syncvldb"/"vos syncserv", but we
don't know if this will fix the problem, particularly in the case of
duplicate entries pointing to the same place. Our second thought is
to do it after zeroing out the VLDB, but the downtime we'd suffer
isn't very appealing. :) Our third thought is that we might have a
more serious corruption, since we had a problem with our VLDB several
months ago (which we thought we had fixed).
Right now, everything appears to be working "normally", excepting the
"vos backupsys" being very cranky about a large number of non-existent
volumes, but clearly something needs to be done and we're pretty much
out of our depth.
Our current OpenAFS version is 1.2.13, but our upgrade path to 1.4.7
was in progress when interrupted by this problem. (We were starting
with file servers, so the databases are still at 1.2.13.)
So what do you think would be the safest and/or best course of action
to take? Thanks in advance for your advice.
William Setzer
Systems & Hosted Services
Office of Information Technology
NC State University
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
<<inline: graycol.gif>>
<<inline: pic03815.gif>>
<<inline: ecblank.gif>>
