Thanks Boyd:

We have 3 io servers, each also running metadata servers.  One will
not come up (that's the 3rd server).  I did try and run the db check
command (forget the specifics), and it did return a single chunk of
entries that are not readable.  As you may guess from the above, I've
never interacted with bdb on a direct or low level.  I don't have a
good answer for #3; I noticed about 1/3 of the directory entries were
"red" on the terminal, and several individuals contacted me with pvfs
problems.

I will begin building new versions of bdb.  Do I need to install this
just on the servers, or do the clients need it as well?

--Jim

On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson <[email protected]> wrote:
> Jim,
> We have been discussing your issue internally.   A few questions:
> 1. How many metadata servers do you have?
> 2. Do you know which one is affected (if there is more than one)?
> 3. How much of the file system can you currently see?
>
> The issue you mentioned seems to be the one we have seen with the earlier
> versions of BerkeleyDB and we have not seen them with the newer versions as
> Becky mentioned.  In our discussions we can't recall if we tried doing a low
> level BDB access to the MD for the unaffected entries and back them up so
> they can be restored in a new BDB.  If you are comfortable with lower level
> BDB commands you may want to see if you can read the entries up to the
> corruption and after, if you can do both, you may be able to write a small
> program to read out all the entries into a file or another BDB, then rebuild
> the BDB with the valid entries.
>
> thx
> -boyd
>
> On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon <[email protected]> wrote:
>>
>> Jim:
>>
>> I understand your situation.  Here at Clemson University, we went through
>> the same situation a couple of years ago.  Now, we backup the metadata
>> databases.  We don't have the space to backup our data either!
>>
>> Under no circumstances should you run pvfs2-fsck.  If you do, then we
>> won't be able to help at all, if you run this command in the destructive
>> mode.  If you're willing, Omnibond MAY be able to write some utilities that
>> we help you recover most of the data.  You will have to speak to Boyd Wilson
>> ([email protected]) and workout something.
>>
>> Becky Ligon
>>
>>
>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir <[email protected]> wrote:
>>>
>>> I made no changes to my environment; it was up and running just fine.
>>> I ran db_recover, and it immediately returned, with no apparent sign
>>> of doing anything but creating a log.000000001 file.
>>>
>>> I have the centos DB installed, db4-4.3.29-10.el5
>>>
>>> I have no backups; this is my high performance filesystem of 99TB; it
>>> is the largest disk we have and therefore have no means of backing it
>>> up.  We don't have anything big enough to hold that much data.
>>>
>>> Is there any hope?  Can we just identify and delete the files that
>>> have the db dammange on it?  (Note that I don't even have anywhere to
>>> back up this data to temporally if we do get it running, so I'd need
>>> to "fix in place".
>>>
>>> thanks!
>>> --Jim
>>>
>>> --Jim
>>>
>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon <[email protected]> wrote:
>>> > Jim:
>>> >
>>> > If you haven't made any recent changes to your pvfs environment or
>>> > Berkeley
>>> > Db installation, then it looks like you have a corrupted metadata
>>> > database.
>>> > There is no way to easily recover.  Sometimes, the Berkeley db command
>>> > "db_recover" might work, but PVFS doesn't have transactions turned on,
>>> > so
>>> > normally it doesn't work.  It's worth a try, just to be sure.
>>> >
>>> > Do you have any recent backups of the databases?  If so, then you will
>>> > need
>>> > to use a set of backups that were created around the same time, so the
>>> > databases will be somewhat consistent with each other.
>>> >
>>> > Which version of Berkeley are you using?  We have had corruption issues
>>> > with
>>> > older versions of it.  We strongly recommend 4.8 or higher.  There are
>>> > some
>>> > know problems with threads in the older versions .
>>> >
>>> > Becky Ligon
>>> >
>>> > On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi all:
>>> >>
>>> >> I got some notices from my users with "wierdness with pvfs2" this
>>> >> morning, and went and investagated.  eventually, I found the following
>>> >> on one of my 3 serers:
>>> >>
>>> >> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>>> >> starting...
>>> >> [E 03/30 12:23] Warning: got invalid handle or key size in
>>> >> dbpf_dspace_iterate_handles().
>>> >> [E 03/30 12:23] Warning: skipping entry.
>>> >> [E 03/30 12:23] c_get failed on iteration 3044
>>> >> [E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>>> >> [E 03/30 12:23] Error adding handle range
>>> >> 1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
>>> >> [E 03/30 12:23] Error: Could not initialize server interfaces;
>>> >> aborting.
>>> >> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>> >>
>>> >> ------------
>>> >> pvfs2-fs.conf:
>>> >> -----------
>>> >>
>>> >> <Defaults>
>>> >>        UnexpectedRequests 50
>>> >>        EventLogging none
>>> >>        LogStamp datetime
>>> >>        BMIModules bmi_tcp
>>> >>        FlowModules flowproto_multiqueue
>>> >>        PerfUpdateInterval 1000
>>> >>        ServerJobBMITimeoutSecs 30
>>> >>        ServerJobFlowTimeoutSecs 30
>>> >>        ClientJobBMITimeoutSecs 300
>>> >>        ClientJobFlowTimeoutSecs 300
>>> >>        ClientRetryLimit 5
>>> >>        ClientRetryDelayMilliSecs 2000
>>> >>        StorageSpace /mnt/pvfs2
>>> >>        LogFile /var/log/pvfs2-server.log
>>> >> </Defaults>
>>> >>
>>> >> <Aliases>
>>> >>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>> >>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>> >>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>> >> </Aliases>
>>> >>
>>> >> <Filesystem>
>>> >>        Name pvfs2-fs
>>> >>        ID 62659950
>>> >>        RootHandle 1048576
>>> >>        <MetaHandleRanges>
>>> >>                Range pvfs2-io-0-0 4-715827885
>>> >>                Range pvfs2-io-0-1 715827886-1431655767
>>> >>                Range pvfs2-io-0-2 1431655768-2147483649
>>> >>        </MetaHandleRanges>
>>> >>        <DataHandleRanges>
>>> >>                Range pvfs2-io-0-0 2147483650-2863311531
>>> >>                Range pvfs2-io-0-1 2863311532-3579139413
>>> >>                Range pvfs2-io-0-2 3579139414-4294967295
>>> >>        </DataHandleRanges>
>>> >>        <StorageHints>
>>> >>                TroveSyncMeta yes
>>> >>                TroveSyncData no
>>> >>        </StorageHints>
>>> >> </Filesystem>
>>> >> -------------
>>> >> Any suggestions for recovery?
>>> >>
>>> >> Thanks!
>>> >> --Jim
>>> >> _______________________________________________
>>> >> Pvfs2-users mailing list
>>> >> [email protected]
>>> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Becky Ligon
>>> > OrangeFS Support and Development
>>> > Omnibond Systems
>>> > Anderson, South Carolina
>>> >
>>> >
>>
>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to