On 04/05/2012 01:47 PM, Jim Kusznir wrote:
I think its repaired. After using Phil's method, I got a file that
the pvfs2-display displayed all content, so I started the server and
got:
[S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 starting...
[E 04/05 10:45] Warning: got invalid handle or key size in
dbpf_dspace_iterate_handles().
[E 04/05 10:45] Warning: skipping entry.
[S 04/05 10:45] PVFS2 Server ready.
I believe this means recovery is as compelte as possible, and that
there's an entry that's missing now, is this correct?
At the very least, the .db file that you have now is entirely valid from
Berkeley DB's point of view. It looks like there is a stray entry in
there that PVFS doesn't understand, but it shouldn't interfere with
anything. You will just see that warning when you start the server.
Is it ready to
go back into production (once I update versions of db and pvfs2)?
I would think so. You mentioned originally that some users were seeing
some "weirdness", so maybe you can someone to check whatever data they
were working with before to see if it looks ok.
-Phil
--Jim
On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles<[email protected]> wrote:
Try "make develtools".
-- Elaine
-----Original Message-----
From: Jim Kusznir [mailto:[email protected]]
Sent: Wednesday, April 04, 2012 3:45 PM
To: Elaine Quarles
Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors detected
I patched everything and ran configure and make, but it didn't build
pvfs2-db-display. The .c file is present. I haven't found the magic make
command to cause that to be built either...Suggestions?
--Jim
On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles<[email protected]> wrote:
Sorry for the delay. Attached are db-display.tar. If you expand this
from the top level directory of your source tree it will create the
src/apps/devel directory. Makefile.in.patch will patch your
Makefile.in with the logic necessary to build pvfs2-db-display. Please
note that it is necessary to run the configure script to update your
Makefile.
Please send the results of running this utility so we can determine
whether it is necessary to try continuous forward reading through the
database, skipping error records or whether we will have to also read
from the end of the database backwards.
Thanks,
Elaine
-----Original Message-----
From: Jim Kusznir [mailto:[email protected]]
Sent: Wednesday, April 04, 2012 1:56 PM
To: Elaine Quarles
Cc: Becky Ligon
Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
detected
Any updates? My entire cluster is still offline due to this problem,
and my users are starting to look for their pitchforks....
Thanks!
--Jim
On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles<[email protected]>
wrote:
Jim,
Could you please check whether your pvfs 2.8.2 distribution contains
src/apps/devel/pvfs2-db-display.c? If so you can build it by running
"make develtools". If your distribution does not contain this file
let me know and I will send a patch.
If you already have the utility, please redirect the output and send
it so we can see what it has to say about the state of the database
and determine the next step from there.
Here is the command-line format.
Usage: ./pvfs2-db-display --dbpath<path> --hexdir<hexdir>
Example: ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir
4e3f77a5
Options:
--verbose Enable verbose output
--help This message.
--dbpath<path> The path of the server's StorageSpace.
The path
should contain collections.db and
storage_attributes.db
--hexdir<dir> The directory in dbpath that contains
collection_attributes.db,
dataspace_attrbutes.db
and keyval.db
Thanks,
Elaine
-----Original Message-----
From: Jim Kusznir [mailto:[email protected]]
Sent: Monday, April 02, 2012 5:57 PM
To: [email protected]
Cc: [email protected]; [email protected];
[email protected]
Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
detected
If this is the recommended method for recovery, then lets do it.
Just one more question on how pvfs2 runs: is the metadata contained
on each server different, or should they all be identical copies? It
just occurred to me that my understanding of the metadata was that
all three metadata servers were redundant..... Or is this a
"different
metadata" db?
--Jim
On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon<[email protected]> wrote:
Jim:
We have a program called pvfs2-db-display that reads directly
through the Berkeley DB. We don't know for sure, but we might be
able to use whatever information it will give to recover what we
can. The program reads from the database from logical top to
bottom. We can also change it to read from logical bottom to top.
In this way, we MAY be able to recover the good data that is still
there above and below the corrupted area. We've never done this but
we are willing to give it a
try.
Let us know if you'd like to try this!
Becky
--
Becky Ligon
HPC Admin Staff
PVFS/OrangeFS Developer
Clemson University/Omnibond.com OrangeFS Support
864-650-4065
Your solution sounds like what I am trying to do; I'd prefer to
install db4 into /opt.
If I can get your spec file or srpm, I'd greatly appreciate it!
--Jim
On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon<[email protected]>
wrote:
Jim:
We downloaded the software from the Oracle site and created an rpm
from that. We are running Centos5 on our productions servers with
kernel=2.6.18-238.9.1.el5 and have been running a version of db4
for at least the past 3 years. So, you should be able to create
the rpm. I can send you the rpm that we are using but it is
taylored to our environment; we install db4 in /opt/db4, because
other items depend on the installed version.
Becky
On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir<[email protected]>
wrote:
I've been trying to build a db4 rpm on my centos box, but it
appears it has dependencies that require an OS upgrade...how did
you get anything newer than the stock db4 installed on centos5?
--Jim
On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon<[email protected]>
wrote:
Jim:
I understand your situation. Here at Clemson University, we
went through the same situation a couple of years ago. Now, we
backup the
metadata
databases. We don't have the space to backup our data either!
Under no circumstances should you run pvfs2-fsck. If you do,
then we won't be able to help at all, if you run this command
in the destructive
mode.
If
you're willing, Omnibond MAY be able to write some utilities
that we help you recover most of the data. You will have to
speak to Boyd Wilson
([email protected]) and workout something.
Becky Ligon
On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir
<[email protected]>
wrote:
I made no changes to my environment; it was up and running
just
fine.
I ran db_recover, and it immediately returned, with no
apparent sign of doing anything but creating a log.000000001 file.
I have the centos DB installed, db4-4.3.29-10.el5
I have no backups; this is my high performance filesystem of
99TB;
it
is the largest disk we have and therefore have no means of
backing
it
up. We don't have anything big enough to hold that much data.
Is there any hope? Can we just identify and delete the files
that have the db dammange on it? (Note that I don't even have
anywhere
to
back up this data to temporally if we do get it running, so
I'd need to "fix in place".
thanks!
--Jim
--Jim
On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon
<[email protected]>
wrote:
Jim:
If you haven't made any recent changes to your pvfs
environment or Berkeley Db installation, then it looks like
you have a corrupted metadata database.
There is no way to easily recover. Sometimes, the Berkeley
db command "db_recover" might work, but PVFS doesn't have
transactions turned on, so normally it doesn't work. It's
worth a try, just to be sure.
Do you have any recent backups of the databases? If so,
then you will need to use a set of backups that were created
around the same time, so the databases will be somewhat
consistent with each other.
Which version of Berkeley are you using? We have had
corruption issues with older versions of it. We strongly
recommend 4.8 or higher. There are some know problems with
threads in the older versions .
Becky Ligon
On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir
<[email protected]>
wrote:
Hi all:
I got some notices from my users with "wierdness with pvfs2"
this morning, and went and investagated. eventually, I
found the following on one of my 3 serers:
[S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version
2.8.2 starting...
[E 03/30 12:23] Warning: got invalid handle or key size in
dbpf_dspace_iterate_handles().
[E 03/30 12:23] Warning: skipping entry.
[E 03/30 12:23] c_get failed on iteration 3044 [E 03/30
12:23] dbpf_dspace_iterate_handles_op_svc: Invalid
argument
[E 03/30 12:23] Error adding handle range
1431655768-2147483649,3579139414-4294967295 to filesystem
pvfs2-fs
[E 03/30 12:23] Error: Could not initialize server
interfaces; aborting.
[E 03/30 12:23] Error: Could not initialize server; aborting.
------------
pvfs2-fs.conf:
-----------
<Defaults>
UnexpectedRequests 50
EventLogging none
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
StorageSpace /mnt/pvfs2
LogFile /var/log/pvfs2-server.log</Defaults>
<Aliases>
Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 62659950
RootHandle 1048576
<MetaHandleRanges>
Range pvfs2-io-0-0 4-715827885
Range pvfs2-io-0-1 715827886-1431655767
Range pvfs2-io-0-2 1431655768-2147483649
</MetaHandleRanges>
<DataHandleRanges>
Range pvfs2-io-0-0 2147483650-2863311531
Range pvfs2-io-0-1 2863311532-3579139413
Range pvfs2-io-0-2 3579139414-4294967295
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
</StorageHints>
</Filesystem>
-------------
Any suggestions for recovery?
Thanks!
--Jim
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u
s
e
rs
--
Becky Ligon
OrangeFS Support and Development Omnibond Systems Anderson,
South Carolina
--
Becky Ligon
OrangeFS Support and Development Omnibond Systems Anderson,
South Carolina
--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users