Phil/Jim:

Should you run a pvfs2-fsck at this point, maybe in non-destructive mode,
to see if we have dangling entries?

Becky

On Thu, Apr 5, 2012 at 4:27 PM, Phil Carns <[email protected]> wrote:

> On 04/05/2012 01:47 PM, Jim Kusznir wrote:
>
>> I think its repaired.  After using Phil's method, I got a file that
>> the pvfs2-display displayed all content, so I started the server and
>> got:
>> [S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>> starting...
>> [E 04/05 10:45] Warning: got invalid handle or key size in
>> dbpf_dspace_iterate_handles().
>> [E 04/05 10:45] Warning: skipping entry.
>> [S 04/05 10:45] PVFS2 Server ready.
>>
>> I believe this means recovery is as compelte as possible, and that
>> there's an entry that's missing now, is this correct?
>>
>
> At the very least, the .db file that you have now is entirely valid from
> Berkeley DB's point of view.  It looks like there is a stray entry in there
> that PVFS doesn't understand, but it shouldn't interfere with anything.
>  You will just see that warning when you start the server.
>
>
>  Is it ready to
>> go back into production (once I update versions of db and pvfs2)?
>>
>
> I would think so.  You mentioned originally that some users were seeing
> some "weirdness", so maybe you can someone to check whatever data they were
> working with before to see if it looks ok.
>
> -Phil
>
>
>> --Jim
>>
>>
>> On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles<[email protected]>
>>  wrote:
>>
>>> Try "make develtools".
>>>
>>> -- Elaine
>>>
>>> -----Original Message-----
>>> From: Jim Kusznir [mailto:[email protected]]
>>> Sent: Wednesday, April 04, 2012 3:45 PM
>>> To: Elaine Quarles
>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>> detected
>>>
>>> I patched everything and ran configure and make, but it didn't build
>>> pvfs2-db-display.  The .c file is present.  I haven't found the magic
>>> make
>>> command to cause that to be built either...Suggestions?
>>>
>>> --Jim
>>>
>>> On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles<[email protected]>
>>>  wrote:
>>>
>>>> Sorry for the delay. Attached are db-display.tar. If you expand this
>>>> from the top level directory of your source tree it will create the
>>>> src/apps/devel directory. Makefile.in.patch will patch your
>>>> Makefile.in with the logic necessary to build pvfs2-db-display. Please
>>>> note that it is necessary to run the configure script to update your
>>>>
>>> Makefile.
>>>
>>>> Please send the results of running this utility so we can determine
>>>> whether it is necessary to try continuous forward reading through the
>>>> database, skipping error records or whether we will have to also read
>>>> from the end of the database backwards.
>>>>
>>>> Thanks,
>>>> Elaine
>>>>
>>>> -----Original Message-----
>>>> From: Jim Kusznir [mailto:[email protected]]
>>>> Sent: Wednesday, April 04, 2012 1:56 PM
>>>> To: Elaine Quarles
>>>> Cc: Becky Ligon
>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>>> detected
>>>>
>>>> Any updates?  My entire cluster is still offline due to this problem,
>>>> and my users are starting to look for their pitchforks....
>>>>
>>>> Thanks!
>>>> --Jim
>>>>
>>>> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles<[email protected]>
>>>>
>>> wrote:
>>>
>>>> Jim,
>>>>>
>>>>> Could you please check whether your pvfs 2.8.2 distribution contains
>>>>> src/apps/devel/pvfs2-db-**display.c? If so you can build it by running
>>>>> "make develtools". If your distribution does not contain this file
>>>>> let me know and I will send a patch.
>>>>>
>>>>> If you already have the utility, please redirect the output and send
>>>>> it so we can see what it has to say about the state of the database
>>>>> and determine the next step from there.
>>>>>
>>>>> Here is the command-line format.
>>>>> Usage:          ./pvfs2-db-display --dbpath<path>  --hexdir<hexdir>
>>>>> Example:        ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir
>>>>> 4e3f77a5
>>>>>
>>>>> Options:
>>>>>        --verbose               Enable verbose output
>>>>>        --help                  This message.
>>>>>        --dbpath<path>           The path of the server's StorageSpace.
>>>>> The path
>>>>>                                should contain collections.db and
>>>>>                                storage_attributes.db
>>>>>        --hexdir<dir>            The directory in dbpath that contains
>>>>>                                collection_attributes.db,
>>>>> dataspace_attrbutes.db
>>>>>                                and keyval.db
>>>>>
>>>>> Thanks,
>>>>> Elaine
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jim Kusznir [mailto:[email protected]]
>>>>> Sent: Monday, April 02, 2012 5:57 PM
>>>>> To: [email protected]
>>>>> Cc: [email protected]; [email protected];
>>>>> [email protected]
>>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>>>> detected
>>>>>
>>>>> If this is the recommended method for recovery, then lets do it.
>>>>>
>>>>> Just one more question on how pvfs2 runs: is the metadata contained
>>>>> on each server different, or should they all be identical copies?  It
>>>>> just occurred to me that my understanding of the metadata was that
>>>>> all three metadata servers were redundant.....  Or is this a
>>>>> "different
>>>>>
>>>> metadata" db?
>>>>
>>>>> --Jim
>>>>>
>>>>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon<[email protected]>  wrote:
>>>>>
>>>>>> Jim:
>>>>>>
>>>>>> We have a program called pvfs2-db-display that reads directly
>>>>>> through the Berkeley DB.  We don't know for sure, but we might be
>>>>>> able to use whatever information it will give to recover what we
>>>>>> can.  The program reads from the database from logical top to
>>>>>> bottom.  We can also change it to read from logical bottom to top.
>>>>>> In this way, we MAY be able to recover the good data that is still
>>>>>> there above and below the corrupted area.  We've never done this but
>>>>>> we are willing to give it a
>>>>>>
>>>>> try.
>>>>>
>>>>>> Let us know if you'd like to try this!
>>>>>>
>>>>>> Becky
>>>>>> --
>>>>>> Becky Ligon
>>>>>> HPC Admin Staff
>>>>>> PVFS/OrangeFS Developer
>>>>>> Clemson University/Omnibond.com OrangeFS Support
>>>>>> 864-650-4065
>>>>>>
>>>>>>  Your solution sounds like what I am trying to do; I'd prefer to
>>>>>>> install db4 into /opt.
>>>>>>>
>>>>>>> If I can get your spec file or srpm, I'd greatly appreciate it!
>>>>>>>
>>>>>>> --Jim
>>>>>>>
>>>>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon<[email protected]>
>>>>>>>
>>>>>> wrote:
>>>
>>>> Jim:
>>>>>>>>
>>>>>>>> We downloaded the software from the Oracle site and created an rpm
>>>>>>>> from that.  We are running Centos5 on our productions servers with
>>>>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4
>>>>>>>> for at least the past 3 years.  So, you should be able to create
>>>>>>>> the rpm.  I can send you the rpm that we are using but it is
>>>>>>>> taylored to our environment; we install db4 in /opt/db4, because
>>>>>>>> other items depend on the installed version.
>>>>>>>>
>>>>>>>> Becky
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir<[email protected]>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> I've been trying to build a db4 rpm on my centos box, but it
>>>>>>>>> appears it has dependencies that require an OS upgrade...how did
>>>>>>>>> you get anything newer than the stock db4 installed on centos5?
>>>>>>>>>
>>>>>>>>> --Jim
>>>>>>>>>
>>>>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon<[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Jim:
>>>>>>>>>>
>>>>>>>>>> I understand your situation.  Here at Clemson University, we
>>>>>>>>>> went through the same situation a couple of years ago.  Now, we
>>>>>>>>>> backup the
>>>>>>>>>>
>>>>>>>>> metadata
>>>>>>>>>
>>>>>>>>>> databases.  We don't have the space to backup our data either!
>>>>>>>>>>
>>>>>>>>>> Under no circumstances should you run pvfs2-fsck.  If you do,
>>>>>>>>>> then we won't be able to help at all, if you run this command
>>>>>>>>>> in the destructive
>>>>>>>>>>
>>>>>>>>> mode.
>>>>>>>>>
>>>>>>>>>>  If
>>>>>>>>>> you're willing, Omnibond MAY be able to write some utilities
>>>>>>>>>> that we help you recover most of the data.  You will have to
>>>>>>>>>> speak to Boyd Wilson
>>>>>>>>>> ([email protected]) and workout something.
>>>>>>>>>>
>>>>>>>>>> Becky Ligon
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir
>>>>>>>>>> <[email protected]>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I made no changes to my environment; it was up and running
>>>>>>>>>>> just
>>>>>>>>>>>
>>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>>> I ran db_recover, and it immediately returned, with no
>>>>>>>>>>> apparent sign of doing anything but creating a log.000000001
>>>>>>>>>>> file.
>>>>>>>>>>>
>>>>>>>>>>> I have the centos DB installed, db4-4.3.29-10.el5
>>>>>>>>>>>
>>>>>>>>>>> I have no backups; this is my high performance filesystem of
>>>>>>>>>>> 99TB;
>>>>>>>>>>>
>>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is the largest disk we have and therefore have no means of
>>>>>>>>>>> backing
>>>>>>>>>>>
>>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> up.  We don't have anything big enough to hold that much data.
>>>>>>>>>>>
>>>>>>>>>>> Is there any hope?  Can we just identify and delete the files
>>>>>>>>>>> that have the db dammange on it?  (Note that I don't even have
>>>>>>>>>>> anywhere
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>> back up this data to temporally if we do get it running, so
>>>>>>>>>>> I'd need to "fix in place".
>>>>>>>>>>>
>>>>>>>>>>> thanks!
>>>>>>>>>>> --Jim
>>>>>>>>>>>
>>>>>>>>>>> --Jim
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon
>>>>>>>>>>> <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Jim:
>>>>>>>>>>>>
>>>>>>>>>>>> If you haven't made any recent changes to your pvfs
>>>>>>>>>>>> environment or Berkeley Db installation, then it looks like
>>>>>>>>>>>> you have a corrupted metadata database.
>>>>>>>>>>>> There is no way to easily recover.  Sometimes, the Berkeley
>>>>>>>>>>>> db command "db_recover" might work, but PVFS doesn't have
>>>>>>>>>>>> transactions turned on, so normally it doesn't work.  It's
>>>>>>>>>>>> worth a try, just to be sure.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have any recent backups of the databases?  If so,
>>>>>>>>>>>> then you will need to use a set of backups that were created
>>>>>>>>>>>> around the same time, so the databases will be somewhat
>>>>>>>>>>>> consistent with each other.
>>>>>>>>>>>>
>>>>>>>>>>>> Which version of Berkeley are you using?  We have had
>>>>>>>>>>>> corruption issues with older versions of it.  We strongly
>>>>>>>>>>>> recommend 4.8 or higher.  There are some know problems with
>>>>>>>>>>>> threads in the older versions .
>>>>>>>>>>>>
>>>>>>>>>>>> Becky Ligon
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I got some notices from my users with "wierdness with pvfs2"
>>>>>>>>>>>>> this morning, and went and investagated.  eventually, I
>>>>>>>>>>>>> found the following on one of my 3 serers:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version
>>>>>>>>>>>>> 2.8.2 starting...
>>>>>>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in
>>>>>>>>>>>>> dbpf_dspace_iterate_handles().
>>>>>>>>>>>>> [E 03/30 12:23] Warning: skipping entry.
>>>>>>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30
>>>>>>>>>>>>> 12:23] dbpf_dspace_iterate_handles_**op_svc: Invalid
>>>>>>>>>>>>>
>>>>>>>>>>>> argument
>>>>>>>>>
>>>>>>>>>> [E 03/30 12:23] Error adding handle range
>>>>>>>>>>>>> 1431655768-2147483649,**3579139414-4294967295 to filesystem
>>>>>>>>>>>>>
>>>>>>>>>>>> pvfs2-fs
>>>>>>>>>
>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server
>>>>>>>>>>>>> interfaces; aborting.
>>>>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------
>>>>>>>>>>>>> pvfs2-fs.conf:
>>>>>>>>>>>>> -----------
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Defaults>
>>>>>>>>>>>>>        UnexpectedRequests 50
>>>>>>>>>>>>>        EventLogging none
>>>>>>>>>>>>>        LogStamp datetime
>>>>>>>>>>>>>        BMIModules bmi_tcp
>>>>>>>>>>>>>        FlowModules flowproto_multiqueue
>>>>>>>>>>>>>        PerfUpdateInterval 1000
>>>>>>>>>>>>>        ServerJobBMITimeoutSecs 30
>>>>>>>>>>>>>        ServerJobFlowTimeoutSecs 30
>>>>>>>>>>>>>        ClientJobBMITimeoutSecs 300
>>>>>>>>>>>>>        ClientJobFlowTimeoutSecs 300
>>>>>>>>>>>>>        ClientRetryLimit 5
>>>>>>>>>>>>>        ClientRetryDelayMilliSecs 2000
>>>>>>>>>>>>>        StorageSpace /mnt/pvfs2
>>>>>>>>>>>>>        LogFile /var/log/pvfs2-server.log</**Defaults>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Aliases>
>>>>>>>>>>>>>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>>>>>>>>>>>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>>>>>>>>>>>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>>>>>>>>>>>> </Aliases>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Filesystem>
>>>>>>>>>>>>>        Name pvfs2-fs
>>>>>>>>>>>>>        ID 62659950
>>>>>>>>>>>>>        RootHandle 1048576
>>>>>>>>>>>>>        <MetaHandleRanges>
>>>>>>>>>>>>>                Range pvfs2-io-0-0 4-715827885
>>>>>>>>>>>>>                Range pvfs2-io-0-1 715827886-1431655767
>>>>>>>>>>>>>                Range pvfs2-io-0-2 1431655768-2147483649
>>>>>>>>>>>>>        </MetaHandleRanges>
>>>>>>>>>>>>>        <DataHandleRanges>
>>>>>>>>>>>>>                Range pvfs2-io-0-0 2147483650-2863311531
>>>>>>>>>>>>>                Range pvfs2-io-0-1 2863311532-3579139413
>>>>>>>>>>>>>                Range pvfs2-io-0-2 3579139414-4294967295
>>>>>>>>>>>>>        </DataHandleRanges>
>>>>>>>>>>>>>        <StorageHints>
>>>>>>>>>>>>>                TroveSyncMeta yes
>>>>>>>>>>>>>                TroveSyncData no
>>>>>>>>>>>>>        </StorageHints>
>>>>>>>>>>>>> </Filesystem>
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>> Any suggestions for recovery?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> --Jim
>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>> Pvfs2-users mailing list
>>>>>>>>>>>>> Pvfs2-users@beowulf-**underground.org<[email protected]>
>>>>>>>>>>>>> http://www.beowulf-**underground.org/mailman/**
>>>>>>>>>>>>> listinfo/pvfs2-u<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u>
>>>>>>>>>>>>> s
>>>>>>>>>>>>> e
>>>>>>>>>>>>> rs
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Becky Ligon
>>>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>>>>>>> South Carolina
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Becky Ligon
>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>>>>> South Carolina
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Becky Ligon
>>>>>>>> OrangeFS Support and Development
>>>>>>>> Omnibond Systems
>>>>>>>> Anderson, South Carolina
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>  ______________________________**_________________
>> Pvfs2-users mailing list
>> Pvfs2-users@beowulf-**underground.org<[email protected]>
>> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>
>
> ______________________________**_________________
> Pvfs2-users mailing list
> Pvfs2-users@beowulf-**underground.org<[email protected]>
> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>



-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to