Jim,
We have been discussing your issue internally.   A few questions:
1. How many metadata servers do you have?
2. Do you know which one is affected (if there is more than one)?
3. How much of the file system can you currently see?

The issue you mentioned seems to be the one we have seen with the earlier
versions of BerkeleyDB and we have not seen them with the newer versions as
Becky mentioned.  In our discussions we can't recall if we tried doing a
low level BDB access to the MD for the unaffected entries and back them up
so they can be restored in a new BDB.  If you are comfortable with lower
level BDB commands you may want to see if you can read the entries up to
the corruption and after, if you can do both, you may be able to write a
small program to read out all the entries into a file or another BDB, then
rebuild the BDB with the valid entries.

thx
-boyd

On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> I understand your situation.  Here at Clemson University, we went through
> the same situation a couple of years ago.  Now, we backup the metadata
> databases.  We don't have the space to backup our data either!
>
> Under no circumstances should you run pvfs2-fsck.  If you do, then we
> won't be able to help at all, if you run this command in the destructive
> mode.  If you're willing, Omnibond MAY be able to write some utilities that
> we help you recover most of the data.  You will have to speak to Boyd
> Wilson ([email protected] <[email protected]>) and workout something.
>
> Becky Ligon
>
>
> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir <[email protected]> wrote:
>
>> I made no changes to my environment; it was up and running just fine.
>> I ran db_recover, and it immediately returned, with no apparent sign
>> of doing anything but creating a log.000000001 file.
>>
>> I have the centos DB installed, db4-4.3.29-10.el5
>>
>> I have no backups; this is my high performance filesystem of 99TB; it
>> is the largest disk we have and therefore have no means of backing it
>> up.  We don't have anything big enough to hold that much data.
>>
>> Is there any hope?  Can we just identify and delete the files that
>> have the db dammange on it?  (Note that I don't even have anywhere to
>> back up this data to temporally if we do get it running, so I'd need
>> to "fix in place".
>>
>> thanks!
>> --Jim
>>
>> --Jim
>>
>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon <[email protected]> wrote:
>> > Jim:
>> >
>> > If you haven't made any recent changes to your pvfs environment or
>> Berkeley
>> > Db installation, then it looks like you have a corrupted metadata
>> database.
>> > There is no way to easily recover.  Sometimes, the Berkeley db command
>> > "db_recover" might work, but PVFS doesn't have transactions turned on,
>> so
>> > normally it doesn't work.  It's worth a try, just to be sure.
>> >
>> > Do you have any recent backups of the databases?  If so, then you will
>> need
>> > to use a set of backups that were created around the same time, so the
>> > databases will be somewhat consistent with each other.
>> >
>> > Which version of Berkeley are you using?  We have had corruption issues
>> with
>> > older versions of it.  We strongly recommend 4.8 or higher.  There are
>> some
>> > know problems with threads in the older versions .
>> >
>> > Becky Ligon
>> >
>> > On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir <[email protected]>
>> wrote:
>> >>
>> >> Hi all:
>> >>
>> >> I got some notices from my users with "wierdness with pvfs2" this
>> >> morning, and went and investagated.  eventually, I found the following
>> >> on one of my 3 serers:
>> >>
>> >> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>> >> starting...
>> >> [E 03/30 12:23] Warning: got invalid handle or key size in
>> >> dbpf_dspace_iterate_handles().
>> >> [E 03/30 12:23] Warning: skipping entry.
>> >> [E 03/30 12:23] c_get failed on iteration 3044
>> >> [E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>> >> [E 03/30 12:23] Error adding handle range
>> >> 1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
>> >> [E 03/30 12:23] Error: Could not initialize server interfaces;
>> aborting.
>> >> [E 03/30 12:23] Error: Could not initialize server; aborting.
>> >>
>> >> ------------
>> >> pvfs2-fs.conf:
>> >> -----------
>> >>
>> >> <Defaults>
>> >>        UnexpectedRequests 50
>> >>        EventLogging none
>> >>        LogStamp datetime
>> >>        BMIModules bmi_tcp
>> >>        FlowModules flowproto_multiqueue
>> >>        PerfUpdateInterval 1000
>> >>        ServerJobBMITimeoutSecs 30
>> >>        ServerJobFlowTimeoutSecs 30
>> >>        ClientJobBMITimeoutSecs 300
>> >>        ClientJobFlowTimeoutSecs 300
>> >>        ClientRetryLimit 5
>> >>        ClientRetryDelayMilliSecs 2000
>> >>        StorageSpace /mnt/pvfs2
>> >>        LogFile /var/log/pvfs2-server.log
>> >> </Defaults>
>> >>
>> >> <Aliases>
>> >>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>> >>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>> >>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>> >> </Aliases>
>> >>
>> >> <Filesystem>
>> >>        Name pvfs2-fs
>> >>        ID 62659950
>> >>        RootHandle 1048576
>> >>        <MetaHandleRanges>
>> >>                Range pvfs2-io-0-0 4-715827885
>> >>                Range pvfs2-io-0-1 715827886-1431655767
>> >>                Range pvfs2-io-0-2 1431655768-2147483649
>> >>        </MetaHandleRanges>
>> >>        <DataHandleRanges>
>> >>                Range pvfs2-io-0-0 2147483650-2863311531
>> >>                Range pvfs2-io-0-1 2863311532-3579139413
>> >>                Range pvfs2-io-0-2 3579139414-4294967295
>> >>        </DataHandleRanges>
>> >>        <StorageHints>
>> >>                TroveSyncMeta yes
>> >>                TroveSyncData no
>> >>        </StorageHints>
>> >> </Filesystem>
>> >> -------------
>> >> Any suggestions for recovery?
>> >>
>> >> Thanks!
>> >> --Jim
>> >> _______________________________________________
>> >> Pvfs2-users mailing list
>> >> [email protected]
>> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> >
>> >
>> >
>> >
>> > --
>> > Becky Ligon
>> > OrangeFS Support and Development
>> > Omnibond Systems
>> > Anderson, South Carolina
>> >
>> >
>>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to