Here is a newer copy:
http://www.orangefs.org/fisheye/orangefs/browse/~raw,r=9138/orangefs/trunk/doc/db-recovery.txt
-Phil
On 04/04/2012 03:58 PM, Jim Kusznir wrote:
Hmm...my db_recovery docs say that This example only works for the
keyval.db file. The dataspace_attributes.db
file requires a different modification (not provided here). The file
I'm having trouble with is the dataspace_attributes.db.
--Jim
On Wed, Apr 4, 2012 at 11:04 AM, Phil Carns<[email protected]> wrote:
Another option to consider is the technique described in
pvfs2/doc/db-recovery.txt. It describes how to dump and reload two types of
db files. The latter is the one you want in this case
(dataspace_attributes.db). Please make a backup copy of the original .db
file if you try this.
One thing to look out for that isn't mentioned in the doc is that the
rebuilt dataspace_attributes.db will probably be _much_ smaller than the
original. This doesn't mean that it lost data, its just that Berkeley DB
will pack it much more efficiently when all of the entries are rebuilt at
once.
-Phil
On 04/02/2012 01:09 PM, Jim Kusznir wrote:
Thanks Boyd:
We have 3 io servers, each also running metadata servers. One will
not come up (that's the 3rd server). I did try and run the db check
command (forget the specifics), and it did return a single chunk of
entries that are not readable. As you may guess from the above, I've
never interacted with bdb on a direct or low level. I don't have a
good answer for #3; I noticed about 1/3 of the directory entries were
"red" on the terminal, and several individuals contacted me with pvfs
problems.
I will begin building new versions of bdb. Do I need to install this
just on the servers, or do the clients need it as well?
--Jim
On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson<[email protected]> wrote:
Jim,
We have been discussing your issue internally. A few questions:
1. How many metadata servers do you have?
2. Do you know which one is affected (if there is more than one)?
3. How much of the file system can you currently see?
The issue you mentioned seems to be the one we have seen with the earlier
versions of BerkeleyDB and we have not seen them with the newer versions
as
Becky mentioned. In our discussions we can't recall if we tried doing a
low
level BDB access to the MD for the unaffected entries and back them up so
they can be restored in a new BDB. If you are comfortable with lower
level
BDB commands you may want to see if you can read the entries up to the
corruption and after, if you can do both, you may be able to write a
small
program to read out all the entries into a file or another BDB, then
rebuild
the BDB with the valid entries.
thx
-boyd
On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon<[email protected]> wrote:
Jim:
I understand your situation. Here at Clemson University, we went
through
the same situation a couple of years ago. Now, we backup the metadata
databases. We don't have the space to backup our data either!
Under no circumstances should you run pvfs2-fsck. If you do, then we
won't be able to help at all, if you run this command in the destructive
mode. If you're willing, Omnibond MAY be able to write some utilities
that
we help you recover most of the data. You will have to speak to Boyd
Wilson
([email protected]) and workout something.
Becky Ligon
On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir<[email protected]> wrote:
I made no changes to my environment; it was up and running just fine.
I ran db_recover, and it immediately returned, with no apparent sign
of doing anything but creating a log.000000001 file.
I have the centos DB installed, db4-4.3.29-10.el5
I have no backups; this is my high performance filesystem of 99TB; it
is the largest disk we have and therefore have no means of backing it
up. We don't have anything big enough to hold that much data.
Is there any hope? Can we just identify and delete the files that
have the db dammange on it? (Note that I don't even have anywhere to
back up this data to temporally if we do get it running, so I'd need
to "fix in place".
thanks!
--Jim
--Jim
On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon<[email protected]>
wrote:
Jim:
If you haven't made any recent changes to your pvfs environment or
Berkeley
Db installation, then it looks like you have a corrupted metadata
database.
There is no way to easily recover. Sometimes, the Berkeley db command
"db_recover" might work, but PVFS doesn't have transactions turned on,
so
normally it doesn't work. It's worth a try, just to be sure.
Do you have any recent backups of the databases? If so, then you will
need
to use a set of backups that were created around the same time, so the
databases will be somewhat consistent with each other.
Which version of Berkeley are you using? We have had corruption
issues
with
older versions of it. We strongly recommend 4.8 or higher. There are
some
know problems with threads in the older versions .
Becky Ligon
On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir<[email protected]>
wrote:
Hi all:
I got some notices from my users with "wierdness with pvfs2" this
morning, and went and investagated. eventually, I found the
following
on one of my 3 serers:
[S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
starting...
[E 03/30 12:23] Warning: got invalid handle or key size in
dbpf_dspace_iterate_handles().
[E 03/30 12:23] Warning: skipping entry.
[E 03/30 12:23] c_get failed on iteration 3044
[E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
[E 03/30 12:23] Error adding handle range
1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
[E 03/30 12:23] Error: Could not initialize server interfaces;
aborting.
[E 03/30 12:23] Error: Could not initialize server; aborting.
------------
pvfs2-fs.conf:
-----------
<Defaults>
UnexpectedRequests 50
EventLogging none
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
StorageSpace /mnt/pvfs2
LogFile /var/log/pvfs2-server.log
</Defaults>
<Aliases>
Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 62659950
RootHandle 1048576
<MetaHandleRanges>
Range pvfs2-io-0-0 4-715827885
Range pvfs2-io-0-1 715827886-1431655767
Range pvfs2-io-0-2 1431655768-2147483649
</MetaHandleRanges>
<DataHandleRanges>
Range pvfs2-io-0-0 2147483650-2863311531
Range pvfs2-io-0-1 2863311532-3579139413
Range pvfs2-io-0-2 3579139414-4294967295
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
</StorageHints>
</Filesystem>
-------------
Any suggestions for recovery?
Thanks!
--Jim
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users