Phil/Jim: Should you run a pvfs2-fsck at this point, maybe in non-destructive mode, to see if we have dangling entries?
Becky On Thu, Apr 5, 2012 at 4:27 PM, Phil Carns <[email protected]> wrote: > On 04/05/2012 01:47 PM, Jim Kusznir wrote: > >> I think its repaired. After using Phil's method, I got a file that >> the pvfs2-display displayed all content, so I started the server and >> got: >> [S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 >> starting... >> [E 04/05 10:45] Warning: got invalid handle or key size in >> dbpf_dspace_iterate_handles(). >> [E 04/05 10:45] Warning: skipping entry. >> [S 04/05 10:45] PVFS2 Server ready. >> >> I believe this means recovery is as compelte as possible, and that >> there's an entry that's missing now, is this correct? >> > > At the very least, the .db file that you have now is entirely valid from > Berkeley DB's point of view. It looks like there is a stray entry in there > that PVFS doesn't understand, but it shouldn't interfere with anything. > You will just see that warning when you start the server. > > > Is it ready to >> go back into production (once I update versions of db and pvfs2)? >> > > I would think so. You mentioned originally that some users were seeing > some "weirdness", so maybe you can someone to check whatever data they were > working with before to see if it looks ok. > > -Phil > > >> --Jim >> >> >> On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles<[email protected]> >> wrote: >> >>> Try "make develtools". >>> >>> -- Elaine >>> >>> -----Original Message----- >>> From: Jim Kusznir [mailto:[email protected]] >>> Sent: Wednesday, April 04, 2012 3:45 PM >>> To: Elaine Quarles >>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>> detected >>> >>> I patched everything and ran configure and make, but it didn't build >>> pvfs2-db-display. The .c file is present. I haven't found the magic >>> make >>> command to cause that to be built either...Suggestions? >>> >>> --Jim >>> >>> On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles<[email protected]> >>> wrote: >>> >>>> Sorry for the delay. Attached are db-display.tar. If you expand this >>>> from the top level directory of your source tree it will create the >>>> src/apps/devel directory. Makefile.in.patch will patch your >>>> Makefile.in with the logic necessary to build pvfs2-db-display. Please >>>> note that it is necessary to run the configure script to update your >>>> >>> Makefile. >>> >>>> Please send the results of running this utility so we can determine >>>> whether it is necessary to try continuous forward reading through the >>>> database, skipping error records or whether we will have to also read >>>> from the end of the database backwards. >>>> >>>> Thanks, >>>> Elaine >>>> >>>> -----Original Message----- >>>> From: Jim Kusznir [mailto:[email protected]] >>>> Sent: Wednesday, April 04, 2012 1:56 PM >>>> To: Elaine Quarles >>>> Cc: Becky Ligon >>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>>> detected >>>> >>>> Any updates? My entire cluster is still offline due to this problem, >>>> and my users are starting to look for their pitchforks.... >>>> >>>> Thanks! >>>> --Jim >>>> >>>> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles<[email protected]> >>>> >>> wrote: >>> >>>> Jim, >>>>> >>>>> Could you please check whether your pvfs 2.8.2 distribution contains >>>>> src/apps/devel/pvfs2-db-**display.c? If so you can build it by running >>>>> "make develtools". If your distribution does not contain this file >>>>> let me know and I will send a patch. >>>>> >>>>> If you already have the utility, please redirect the output and send >>>>> it so we can see what it has to say about the state of the database >>>>> and determine the next step from there. >>>>> >>>>> Here is the command-line format. >>>>> Usage: ./pvfs2-db-display --dbpath<path> --hexdir<hexdir> >>>>> Example: ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir >>>>> 4e3f77a5 >>>>> >>>>> Options: >>>>> --verbose Enable verbose output >>>>> --help This message. >>>>> --dbpath<path> The path of the server's StorageSpace. >>>>> The path >>>>> should contain collections.db and >>>>> storage_attributes.db >>>>> --hexdir<dir> The directory in dbpath that contains >>>>> collection_attributes.db, >>>>> dataspace_attrbutes.db >>>>> and keyval.db >>>>> >>>>> Thanks, >>>>> Elaine >>>>> >>>>> -----Original Message----- >>>>> From: Jim Kusznir [mailto:[email protected]] >>>>> Sent: Monday, April 02, 2012 5:57 PM >>>>> To: [email protected] >>>>> Cc: [email protected]; [email protected]; >>>>> [email protected] >>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>>>> detected >>>>> >>>>> If this is the recommended method for recovery, then lets do it. >>>>> >>>>> Just one more question on how pvfs2 runs: is the metadata contained >>>>> on each server different, or should they all be identical copies? It >>>>> just occurred to me that my understanding of the metadata was that >>>>> all three metadata servers were redundant..... Or is this a >>>>> "different >>>>> >>>> metadata" db? >>>> >>>>> --Jim >>>>> >>>>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon<[email protected]> wrote: >>>>> >>>>>> Jim: >>>>>> >>>>>> We have a program called pvfs2-db-display that reads directly >>>>>> through the Berkeley DB. We don't know for sure, but we might be >>>>>> able to use whatever information it will give to recover what we >>>>>> can. The program reads from the database from logical top to >>>>>> bottom. We can also change it to read from logical bottom to top. >>>>>> In this way, we MAY be able to recover the good data that is still >>>>>> there above and below the corrupted area. We've never done this but >>>>>> we are willing to give it a >>>>>> >>>>> try. >>>>> >>>>>> Let us know if you'd like to try this! >>>>>> >>>>>> Becky >>>>>> -- >>>>>> Becky Ligon >>>>>> HPC Admin Staff >>>>>> PVFS/OrangeFS Developer >>>>>> Clemson University/Omnibond.com OrangeFS Support >>>>>> 864-650-4065 >>>>>> >>>>>> Your solution sounds like what I am trying to do; I'd prefer to >>>>>>> install db4 into /opt. >>>>>>> >>>>>>> If I can get your spec file or srpm, I'd greatly appreciate it! >>>>>>> >>>>>>> --Jim >>>>>>> >>>>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon<[email protected]> >>>>>>> >>>>>> wrote: >>> >>>> Jim: >>>>>>>> >>>>>>>> We downloaded the software from the Oracle site and created an rpm >>>>>>>> from that. We are running Centos5 on our productions servers with >>>>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4 >>>>>>>> for at least the past 3 years. So, you should be able to create >>>>>>>> the rpm. I can send you the rpm that we are using but it is >>>>>>>> taylored to our environment; we install db4 in /opt/db4, because >>>>>>>> other items depend on the installed version. >>>>>>>> >>>>>>>> Becky >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir<[email protected]> >>>>>>>> >>>>>>> wrote: >>> >>>> I've been trying to build a db4 rpm on my centos box, but it >>>>>>>>> appears it has dependencies that require an OS upgrade...how did >>>>>>>>> you get anything newer than the stock db4 installed on centos5? >>>>>>>>> >>>>>>>>> --Jim >>>>>>>>> >>>>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon<[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Jim: >>>>>>>>>> >>>>>>>>>> I understand your situation. Here at Clemson University, we >>>>>>>>>> went through the same situation a couple of years ago. Now, we >>>>>>>>>> backup the >>>>>>>>>> >>>>>>>>> metadata >>>>>>>>> >>>>>>>>>> databases. We don't have the space to backup our data either! >>>>>>>>>> >>>>>>>>>> Under no circumstances should you run pvfs2-fsck. If you do, >>>>>>>>>> then we won't be able to help at all, if you run this command >>>>>>>>>> in the destructive >>>>>>>>>> >>>>>>>>> mode. >>>>>>>>> >>>>>>>>>> If >>>>>>>>>> you're willing, Omnibond MAY be able to write some utilities >>>>>>>>>> that we help you recover most of the data. You will have to >>>>>>>>>> speak to Boyd Wilson >>>>>>>>>> ([email protected]) and workout something. >>>>>>>>>> >>>>>>>>>> Becky Ligon >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir >>>>>>>>>> <[email protected]> >>>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I made no changes to my environment; it was up and running >>>>>>>>>>> just >>>>>>>>>>> >>>>>>>>>> fine. >>>>>>>>> >>>>>>>>>> I ran db_recover, and it immediately returned, with no >>>>>>>>>>> apparent sign of doing anything but creating a log.000000001 >>>>>>>>>>> file. >>>>>>>>>>> >>>>>>>>>>> I have the centos DB installed, db4-4.3.29-10.el5 >>>>>>>>>>> >>>>>>>>>>> I have no backups; this is my high performance filesystem of >>>>>>>>>>> 99TB; >>>>>>>>>>> >>>>>>>>>> it >>>>>>>>> >>>>>>>>>> is the largest disk we have and therefore have no means of >>>>>>>>>>> backing >>>>>>>>>>> >>>>>>>>>> it >>>>>>>>> >>>>>>>>>> up. We don't have anything big enough to hold that much data. >>>>>>>>>>> >>>>>>>>>>> Is there any hope? Can we just identify and delete the files >>>>>>>>>>> that have the db dammange on it? (Note that I don't even have >>>>>>>>>>> anywhere >>>>>>>>>>> >>>>>>>>>> to >>>>>>>>> >>>>>>>>>> back up this data to temporally if we do get it running, so >>>>>>>>>>> I'd need to "fix in place". >>>>>>>>>>> >>>>>>>>>>> thanks! >>>>>>>>>>> --Jim >>>>>>>>>>> >>>>>>>>>>> --Jim >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon >>>>>>>>>>> <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Jim: >>>>>>>>>>>> >>>>>>>>>>>> If you haven't made any recent changes to your pvfs >>>>>>>>>>>> environment or Berkeley Db installation, then it looks like >>>>>>>>>>>> you have a corrupted metadata database. >>>>>>>>>>>> There is no way to easily recover. Sometimes, the Berkeley >>>>>>>>>>>> db command "db_recover" might work, but PVFS doesn't have >>>>>>>>>>>> transactions turned on, so normally it doesn't work. It's >>>>>>>>>>>> worth a try, just to be sure. >>>>>>>>>>>> >>>>>>>>>>>> Do you have any recent backups of the databases? If so, >>>>>>>>>>>> then you will need to use a set of backups that were created >>>>>>>>>>>> around the same time, so the databases will be somewhat >>>>>>>>>>>> consistent with each other. >>>>>>>>>>>> >>>>>>>>>>>> Which version of Berkeley are you using? We have had >>>>>>>>>>>> corruption issues with older versions of it. We strongly >>>>>>>>>>>> recommend 4.8 or higher. There are some know problems with >>>>>>>>>>>> threads in the older versions . >>>>>>>>>>>> >>>>>>>>>>>> Becky Ligon >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir >>>>>>>>>>>> <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all: >>>>>>>>>>>>> >>>>>>>>>>>>> I got some notices from my users with "wierdness with pvfs2" >>>>>>>>>>>>> this morning, and went and investagated. eventually, I >>>>>>>>>>>>> found the following on one of my 3 serers: >>>>>>>>>>>>> >>>>>>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version >>>>>>>>>>>>> 2.8.2 starting... >>>>>>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in >>>>>>>>>>>>> dbpf_dspace_iterate_handles(). >>>>>>>>>>>>> [E 03/30 12:23] Warning: skipping entry. >>>>>>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30 >>>>>>>>>>>>> 12:23] dbpf_dspace_iterate_handles_**op_svc: Invalid >>>>>>>>>>>>> >>>>>>>>>>>> argument >>>>>>>>> >>>>>>>>>> [E 03/30 12:23] Error adding handle range >>>>>>>>>>>>> 1431655768-2147483649,**3579139414-4294967295 to filesystem >>>>>>>>>>>>> >>>>>>>>>>>> pvfs2-fs >>>>>>>>> >>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server >>>>>>>>>>>>> interfaces; aborting. >>>>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting. >>>>>>>>>>>>> >>>>>>>>>>>>> ------------ >>>>>>>>>>>>> pvfs2-fs.conf: >>>>>>>>>>>>> ----------- >>>>>>>>>>>>> >>>>>>>>>>>>> <Defaults> >>>>>>>>>>>>> UnexpectedRequests 50 >>>>>>>>>>>>> EventLogging none >>>>>>>>>>>>> LogStamp datetime >>>>>>>>>>>>> BMIModules bmi_tcp >>>>>>>>>>>>> FlowModules flowproto_multiqueue >>>>>>>>>>>>> PerfUpdateInterval 1000 >>>>>>>>>>>>> ServerJobBMITimeoutSecs 30 >>>>>>>>>>>>> ServerJobFlowTimeoutSecs 30 >>>>>>>>>>>>> ClientJobBMITimeoutSecs 300 >>>>>>>>>>>>> ClientJobFlowTimeoutSecs 300 >>>>>>>>>>>>> ClientRetryLimit 5 >>>>>>>>>>>>> ClientRetryDelayMilliSecs 2000 >>>>>>>>>>>>> StorageSpace /mnt/pvfs2 >>>>>>>>>>>>> LogFile /var/log/pvfs2-server.log</**Defaults> >>>>>>>>>>>>> >>>>>>>>>>>>> <Aliases> >>>>>>>>>>>>> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334 >>>>>>>>>>>>> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334 >>>>>>>>>>>>> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334 >>>>>>>>>>>>> </Aliases> >>>>>>>>>>>>> >>>>>>>>>>>>> <Filesystem> >>>>>>>>>>>>> Name pvfs2-fs >>>>>>>>>>>>> ID 62659950 >>>>>>>>>>>>> RootHandle 1048576 >>>>>>>>>>>>> <MetaHandleRanges> >>>>>>>>>>>>> Range pvfs2-io-0-0 4-715827885 >>>>>>>>>>>>> Range pvfs2-io-0-1 715827886-1431655767 >>>>>>>>>>>>> Range pvfs2-io-0-2 1431655768-2147483649 >>>>>>>>>>>>> </MetaHandleRanges> >>>>>>>>>>>>> <DataHandleRanges> >>>>>>>>>>>>> Range pvfs2-io-0-0 2147483650-2863311531 >>>>>>>>>>>>> Range pvfs2-io-0-1 2863311532-3579139413 >>>>>>>>>>>>> Range pvfs2-io-0-2 3579139414-4294967295 >>>>>>>>>>>>> </DataHandleRanges> >>>>>>>>>>>>> <StorageHints> >>>>>>>>>>>>> TroveSyncMeta yes >>>>>>>>>>>>> TroveSyncData no >>>>>>>>>>>>> </StorageHints> >>>>>>>>>>>>> </Filesystem> >>>>>>>>>>>>> ------------- >>>>>>>>>>>>> Any suggestions for recovery? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> --Jim >>>>>>>>>>>>> ______________________________**_________________ >>>>>>>>>>>>> Pvfs2-users mailing list >>>>>>>>>>>>> Pvfs2-users@beowulf-**underground.org<[email protected]> >>>>>>>>>>>>> http://www.beowulf-**underground.org/mailman/** >>>>>>>>>>>>> listinfo/pvfs2-u<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u> >>>>>>>>>>>>> s >>>>>>>>>>>>> e >>>>>>>>>>>>> rs >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Becky Ligon >>>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>>>>>>> South Carolina >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Becky Ligon >>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>>>>> South Carolina >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Becky Ligon >>>>>>>> OrangeFS Support and Development >>>>>>>> Omnibond Systems >>>>>>>> Anderson, South Carolina >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> ______________________________**_________________ >> Pvfs2-users mailing list >> Pvfs2-users@beowulf-**underground.org<[email protected]> >> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> >> > > ______________________________**_________________ > Pvfs2-users mailing list > Pvfs2-users@beowulf-**underground.org<[email protected]> > http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
