Hi Phil,
First, sorry for taking so long to respond to this email. It got
flagged as important, but then I lost track of the important stuff
somewhere along the way. I've started looking at your validate tool,
and I wanted to ask if you guys have made any significant changes to
what you have here. Also, I went ahead and committed the mgmt-
getconfig and sys-getattr-other-objects patches to HEAD. They look
useful and don't really mess with any of our current code paths.
-sam
On Jun 7, 2006, at 11:47 AM, Phil Carns wrote:
This is a work in progress, but we wanted to go ahead and share
some patches to see if anyone has an comments, etc.
The last patch in this email (pvfs2-validate.patch) implements a
tool similar to pvfs2-fsck that takes a different approach and adds
some different functionality. The first two patches are
independent changes to PVFS2 functionality that make it a little
easier to look for file system problems.
mgmt-getconfig.patch:
---------------
This patch adds a new PVFS_mgmt_getconfig() function to the
management interface. It allows you to retrieve the file system
and server configuration files verbatim from any arbitrary server
in a file system. No protocol changes are necessary because it
uses the existing getconfig operation. It also reuses most of the
state machine that the client normally uses for retrieving
configuration data, with some modifications that allow it to
preserve the text buffers in this case. This function is a building
block for being able to confirm that the server configuration
settings are consistent.
sys-getattr-other-objects.patch:
----------------
This patch makes it safe to call PVFS_sys_getattr() directly on
underlying objects in a file system (such as datafiles), rather
than juse files, directories, and symlinks. This is useful for
confirming valid attributes on individual objects. Only two
changes were needed:
- making sure that only metafiles, dirs, and symlinks get added to
the acache (attribute cache)
- copying the size out for datafiles
pvfs2-validate.patch:
----------------
This adds a command line tool called pvfs2-validate that is modeled
after pvfs2-fsck, but diverges in a few ways:
- It uses a new fsck-utils API (PVFS_fsck_XXX()) to do most of the
work. This API includes functions for validating various types of
PVFS2 objects and checking for problems. The PVFS_fsck_XXX()
functions just make normal PVFS_sys_XXX() and PVFS_mgmt_XXX()
calls under the covers. It includes functions that can walk
directory trees as well, so that you can just call one top level
function to validate an full tree. This api could possibly be
reused by other admin tools.
- It can be run on individual directory trees or files rather than
the entire file system. This is helpful for diagnosing particular
suspected problem areas on the file system when you don't have time
to run a full fsck on a large file system. Running pvfs2-validate
on any directory other than root disables the check for stranded
objects, however, because it has to parse the whole tree to make
sure that all objects are accounted for.
- It can do some basic configuration file sanity checking. To do
this, it retrieves the fs.conf from all servers, strips out
whitespace and comments, and then looks for differences. If any
discrepancies are found, it prints a warning and indicates which
particular servers appear to be using a different fs.conf and what
the actual difference is (using the first server as the "golden"
model).
- It can (optionally) check for bad practice. These are issues
that, while not strictly a file system problem, can cause confusion
for end users. Examples include special characters (like * or ?)
in file names, and relative symbolic links that leave the file
system (because these will break if the mount point is changed).
- As part of its validity checking, it makes sure that all
attribute values returned for each object are sane, rather than
just confirming their existance. For example, does each file have
at least one datafile, are symlink targets non-null, is size > 0,
are the object types correct, etc.
- Default output is mostly silent, rather than printing diagnostic
information for every valid object encountered.
However, there are a few things that pvfs2-validate does _NOT_ do
at this time:
- it does not make any attempt to repair problems; it only reports
them
- it still needs more strenuous testing
Comments and suggestions are welcome...
-Phil
<mgmt-getconfig.patch.gz>
<pvfs2-validate.patch.gz>
<sys-getattr-other-objects.patch.gz>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers