Another option to consider is the technique described in pvfs2/doc/db-recovery.txt. It describes how to dump and reload two types of db files. The latter is the one you want in this case (dataspace_attributes.db). Please make a backup copy of the original .db file if you try this.

One thing to look out for that isn't mentioned in the doc is that the rebuilt dataspace_attributes.db will probably be _much_ smaller than the original. This doesn't mean that it lost data, its just that Berkeley DB will pack it much more efficiently when all of the entries are rebuilt at once.

-Phil

On 04/02/2012 01:09 PM, Jim Kusznir wrote:
Thanks Boyd:

We have 3 io servers, each also running metadata servers.  One will
not come up (that's the 3rd server).  I did try and run the db check
command (forget the specifics), and it did return a single chunk of
entries that are not readable.  As you may guess from the above, I've
never interacted with bdb on a direct or low level.  I don't have a
good answer for #3; I noticed about 1/3 of the directory entries were
"red" on the terminal, and several individuals contacted me with pvfs
problems.

I will begin building new versions of bdb.  Do I need to install this
just on the servers, or do the clients need it as well?

--Jim

On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson<[email protected]>  wrote:
Jim,
We have been discussing your issue internally.   A few questions:
1. How many metadata servers do you have?
2. Do you know which one is affected (if there is more than one)?
3. How much of the file system can you currently see?

The issue you mentioned seems to be the one we have seen with the earlier
versions of BerkeleyDB and we have not seen them with the newer versions as
Becky mentioned.  In our discussions we can't recall if we tried doing a low
level BDB access to the MD for the unaffected entries and back them up so
they can be restored in a new BDB.  If you are comfortable with lower level
BDB commands you may want to see if you can read the entries up to the
corruption and after, if you can do both, you may be able to write a small
program to read out all the entries into a file or another BDB, then rebuild
the BDB with the valid entries.

thx
-boyd

On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon<[email protected]>  wrote:
Jim:

I understand your situation.  Here at Clemson University, we went through
the same situation a couple of years ago.  Now, we backup the metadata
databases.  We don't have the space to backup our data either!

Under no circumstances should you run pvfs2-fsck.  If you do, then we
won't be able to help at all, if you run this command in the destructive
mode.  If you're willing, Omnibond MAY be able to write some utilities that
we help you recover most of the data.  You will have to speak to Boyd Wilson
([email protected]) and workout something.

Becky Ligon


On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir<[email protected]>  wrote:
I made no changes to my environment; it was up and running just fine.
I ran db_recover, and it immediately returned, with no apparent sign
of doing anything but creating a log.000000001 file.

I have the centos DB installed, db4-4.3.29-10.el5

I have no backups; this is my high performance filesystem of 99TB; it
is the largest disk we have and therefore have no means of backing it
up.  We don't have anything big enough to hold that much data.

Is there any hope?  Can we just identify and delete the files that
have the db dammange on it?  (Note that I don't even have anywhere to
back up this data to temporally if we do get it running, so I'd need
to "fix in place".

thanks!
--Jim

--Jim

On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon<[email protected]>  wrote:
Jim:

If you haven't made any recent changes to your pvfs environment or
Berkeley
Db installation, then it looks like you have a corrupted metadata
database.
There is no way to easily recover.  Sometimes, the Berkeley db command
"db_recover" might work, but PVFS doesn't have transactions turned on,
so
normally it doesn't work.  It's worth a try, just to be sure.

Do you have any recent backups of the databases?  If so, then you will
need
to use a set of backups that were created around the same time, so the
databases will be somewhat consistent with each other.

Which version of Berkeley are you using?  We have had corruption issues
with
older versions of it.  We strongly recommend 4.8 or higher.  There are
some
know problems with threads in the older versions .

Becky Ligon

On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir<[email protected]>
wrote:
Hi all:

I got some notices from my users with "wierdness with pvfs2" this
morning, and went and investagated.  eventually, I found the following
on one of my 3 serers:

[S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
starting...
[E 03/30 12:23] Warning: got invalid handle or key size in
dbpf_dspace_iterate_handles().
[E 03/30 12:23] Warning: skipping entry.
[E 03/30 12:23] c_get failed on iteration 3044
[E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
[E 03/30 12:23] Error adding handle range
1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
[E 03/30 12:23] Error: Could not initialize server interfaces;
aborting.
[E 03/30 12:23] Error: Could not initialize server; aborting.

------------
pvfs2-fs.conf:
-----------

<Defaults>
        UnexpectedRequests 50
        EventLogging none
        LogStamp datetime
        BMIModules bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        StorageSpace /mnt/pvfs2
        LogFile /var/log/pvfs2-server.log
</Defaults>

<Aliases>
        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
</Aliases>

<Filesystem>
        Name pvfs2-fs
        ID 62659950
        RootHandle 1048576
        <MetaHandleRanges>
                Range pvfs2-io-0-0 4-715827885
                Range pvfs2-io-0-1 715827886-1431655767
                Range pvfs2-io-0-2 1431655768-2147483649
        </MetaHandleRanges>
        <DataHandleRanges>
                Range pvfs2-io-0-0 2147483650-2863311531
                Range pvfs2-io-0-1 2863311532-3579139413
                Range pvfs2-io-0-2 3579139414-4294967295
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta yes
                TroveSyncData no
        </StorageHints>
</Filesystem>
-------------
Any suggestions for recovery?

Thanks!
--Jim
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina





--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to