So, as part of the upgrades, I'm also upgrading to the latest release of orangefs. I haven't actually installed the RPM yet (been fighting with building it for a while, still don't quite have it the way I want it), but once I do, is there anything special I need to know? This will use db4-4.8.30 and Orangefs 2.8.5).
--Jim On Thu, Apr 5, 2012 at 2:48 PM, Jim Kusznir <[email protected]> wrote: > It did complete successfully. > > Thanks! > --Jim > > On Thu, Apr 5, 2012 at 2:40 PM, Becky Ligon <[email protected]> wrote: >> Good point, Phil. >> >> Jim: >> >> Was pvfs2-db-display able to walk the rebuilt metadata database without any >> issues? If so, then at least we know that the corrupted space is gone. >> This does not mean that the relationships between the data are in tact. >> >> Becky >> >> On Thu, Apr 5, 2012 at 5:28 PM, Phil Carns <[email protected]> wrote: >>> >>> I haven't used pvfs2-fsck or pvfs2-validate first hand in a long time so >>> I'm probably not a good person to ask on that :) >>> >>> Personally I would be inclined to just run "pvfs2-ls -lR /mnt/pvfs2 >& >>> some_log_file.txt" and grep in there for errors. The -R makes it recursive >>> so that it will walk the whole file system. The -l makes it retrieve the >>> size of each file, which will implicitly check that they are all intact. >>> >>> The reason that I suggest this is that pvfs2-ls (although still slow when >>> it has to go through so many files) should be considerably faster than >>> pvfs2-fsck. pvfs2-ls batches getattr operations, uses less memory, and >>> makes no effort to account for stranded objects (which in most cases aren't >>> critical anyway). On occasions where we have used pvfs2-fsck on busy file >>> systems at ANL it has taken an extraordinary amount of time to finish. >>> >>> Depending on your time constraints and how many files you have, you might >>> just want to run the pvfs2-ls check on critical directories, or else run it >>> from root and ctrl-c if it takes too long, once you at least have some >>> confidence that a significant sampling of the files are intact. >>> >>> -Phil >>> >>> >>> On 04/05/2012 05:08 PM, Becky Ligon wrote: >>> >>> Phil/Jim: >>> >>> Should you run a pvfs2-fsck at this point, maybe in non-destructive mode, >>> to see if we have dangling entries? >>> >>> Becky >>> >>> On Thu, Apr 5, 2012 at 4:27 PM, Phil Carns <[email protected]> wrote: >>>> >>>> On 04/05/2012 01:47 PM, Jim Kusznir wrote: >>>>> >>>>> I think its repaired. After using Phil's method, I got a file that >>>>> the pvfs2-display displayed all content, so I started the server and >>>>> got: >>>>> [S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 >>>>> starting... >>>>> [E 04/05 10:45] Warning: got invalid handle or key size in >>>>> dbpf_dspace_iterate_handles(). >>>>> [E 04/05 10:45] Warning: skipping entry. >>>>> [S 04/05 10:45] PVFS2 Server ready. >>>>> >>>>> I believe this means recovery is as compelte as possible, and that >>>>> there's an entry that's missing now, is this correct? >>>> >>>> >>>> At the very least, the .db file that you have now is entirely valid from >>>> Berkeley DB's point of view. It looks like there is a stray entry in there >>>> that PVFS doesn't understand, but it shouldn't interfere with anything. >>>> You >>>> will just see that warning when you start the server. >>>> >>>> >>>>> Is it ready to >>>>> go back into production (once I update versions of db and pvfs2)? >>>> >>>> >>>> I would think so. You mentioned originally that some users were seeing >>>> some "weirdness", so maybe you can someone to check whatever data they were >>>> working with before to see if it looks ok. >>>> >>>> -Phil >>>> >>>>> >>>>> --Jim >>>>> >>>>> >>>>> On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles<[email protected]> >>>>> wrote: >>>>>> >>>>>> Try "make develtools". >>>>>> >>>>>> -- Elaine >>>>>> >>>>>> -----Original Message----- >>>>>> From: Jim Kusznir [mailto:[email protected]] >>>>>> Sent: Wednesday, April 04, 2012 3:45 PM >>>>>> To: Elaine Quarles >>>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>>>>> detected >>>>>> >>>>>> I patched everything and ran configure and make, but it didn't build >>>>>> pvfs2-db-display. The .c file is present. I haven't found the magic >>>>>> make >>>>>> command to cause that to be built either...Suggestions? >>>>>> >>>>>> --Jim >>>>>> >>>>>> On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles<[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Sorry for the delay. Attached are db-display.tar. If you expand this >>>>>>> from the top level directory of your source tree it will create the >>>>>>> src/apps/devel directory. Makefile.in.patch will patch your >>>>>>> Makefile.in with the logic necessary to build pvfs2-db-display. Please >>>>>>> note that it is necessary to run the configure script to update your >>>>>> >>>>>> Makefile. >>>>>>> >>>>>>> Please send the results of running this utility so we can determine >>>>>>> whether it is necessary to try continuous forward reading through the >>>>>>> database, skipping error records or whether we will have to also read >>>>>>> from the end of the database backwards. >>>>>>> >>>>>>> Thanks, >>>>>>> Elaine >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Jim Kusznir [mailto:[email protected]] >>>>>>> Sent: Wednesday, April 04, 2012 1:56 PM >>>>>>> To: Elaine Quarles >>>>>>> Cc: Becky Ligon >>>>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>>>>>> detected >>>>>>> >>>>>>> Any updates? My entire cluster is still offline due to this problem, >>>>>>> and my users are starting to look for their pitchforks.... >>>>>>> >>>>>>> Thanks! >>>>>>> --Jim >>>>>>> >>>>>>> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles<[email protected]> >>>>>> >>>>>> wrote: >>>>>>>> >>>>>>>> Jim, >>>>>>>> >>>>>>>> Could you please check whether your pvfs 2.8.2 distribution contains >>>>>>>> src/apps/devel/pvfs2-db-display.c? If so you can build it by running >>>>>>>> "make develtools". If your distribution does not contain this file >>>>>>>> let me know and I will send a patch. >>>>>>>> >>>>>>>> If you already have the utility, please redirect the output and send >>>>>>>> it so we can see what it has to say about the state of the database >>>>>>>> and determine the next step from there. >>>>>>>> >>>>>>>> Here is the command-line format. >>>>>>>> Usage: ./pvfs2-db-display --dbpath<path> --hexdir<hexdir> >>>>>>>> Example: ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir >>>>>>>> 4e3f77a5 >>>>>>>> >>>>>>>> Options: >>>>>>>> --verbose Enable verbose output >>>>>>>> --help This message. >>>>>>>> --dbpath<path> The path of the server's >>>>>>>> StorageSpace. >>>>>>>> The path >>>>>>>> should contain collections.db and >>>>>>>> storage_attributes.db >>>>>>>> --hexdir<dir> The directory in dbpath that contains >>>>>>>> collection_attributes.db, >>>>>>>> dataspace_attrbutes.db >>>>>>>> and keyval.db >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Elaine >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Jim Kusznir [mailto:[email protected]] >>>>>>>> Sent: Monday, April 02, 2012 5:57 PM >>>>>>>> To: [email protected] >>>>>>>> Cc: [email protected]; [email protected]; >>>>>>>> [email protected] >>>>>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>>>>>>> detected >>>>>>>> >>>>>>>> If this is the recommended method for recovery, then lets do it. >>>>>>>> >>>>>>>> Just one more question on how pvfs2 runs: is the metadata contained >>>>>>>> on each server different, or should they all be identical copies? It >>>>>>>> just occurred to me that my understanding of the metadata was that >>>>>>>> all three metadata servers were redundant..... Or is this a >>>>>>>> "different >>>>>>> >>>>>>> metadata" db? >>>>>>>> >>>>>>>> --Jim >>>>>>>> >>>>>>>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon<[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Jim: >>>>>>>>> >>>>>>>>> We have a program called pvfs2-db-display that reads directly >>>>>>>>> through the Berkeley DB. We don't know for sure, but we might be >>>>>>>>> able to use whatever information it will give to recover what we >>>>>>>>> can. The program reads from the database from logical top to >>>>>>>>> bottom. We can also change it to read from logical bottom to top. >>>>>>>>> In this way, we MAY be able to recover the good data that is still >>>>>>>>> there above and below the corrupted area. We've never done this but >>>>>>>>> we are willing to give it a >>>>>>>> >>>>>>>> try. >>>>>>>>> >>>>>>>>> Let us know if you'd like to try this! >>>>>>>>> >>>>>>>>> Becky >>>>>>>>> -- >>>>>>>>> Becky Ligon >>>>>>>>> HPC Admin Staff >>>>>>>>> PVFS/OrangeFS Developer >>>>>>>>> Clemson University/Omnibond.com OrangeFS Support >>>>>>>>> 864-650-4065 >>>>>>>>> >>>>>>>>>> Your solution sounds like what I am trying to do; I'd prefer to >>>>>>>>>> install db4 into /opt. >>>>>>>>>> >>>>>>>>>> If I can get your spec file or srpm, I'd greatly appreciate it! >>>>>>>>>> >>>>>>>>>> --Jim >>>>>>>>>> >>>>>>>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon<[email protected]> >>>>>> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Jim: >>>>>>>>>>> >>>>>>>>>>> We downloaded the software from the Oracle site and created an rpm >>>>>>>>>>> from that. We are running Centos5 on our productions servers with >>>>>>>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4 >>>>>>>>>>> for at least the past 3 years. So, you should be able to create >>>>>>>>>>> the rpm. I can send you the rpm that we are using but it is >>>>>>>>>>> taylored to our environment; we install db4 in /opt/db4, because >>>>>>>>>>> other items depend on the installed version. >>>>>>>>>>> >>>>>>>>>>> Becky >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir<[email protected]> >>>>>> >>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I've been trying to build a db4 rpm on my centos box, but it >>>>>>>>>>>> appears it has dependencies that require an OS upgrade...how did >>>>>>>>>>>> you get anything newer than the stock db4 installed on centos5? >>>>>>>>>>>> >>>>>>>>>>>> --Jim >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon<[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Jim: >>>>>>>>>>>>> >>>>>>>>>>>>> I understand your situation. Here at Clemson University, we >>>>>>>>>>>>> went through the same situation a couple of years ago. Now, we >>>>>>>>>>>>> backup the >>>>>>>>>>>> >>>>>>>>>>>> metadata >>>>>>>>>>>>> >>>>>>>>>>>>> databases. We don't have the space to backup our data either! >>>>>>>>>>>>> >>>>>>>>>>>>> Under no circumstances should you run pvfs2-fsck. If you do, >>>>>>>>>>>>> then we won't be able to help at all, if you run this command >>>>>>>>>>>>> in the destructive >>>>>>>>>>>> >>>>>>>>>>>> mode. >>>>>>>>>>>>> >>>>>>>>>>>>> If >>>>>>>>>>>>> you're willing, Omnibond MAY be able to write some utilities >>>>>>>>>>>>> that we help you recover most of the data. You will have to >>>>>>>>>>>>> speak to Boyd Wilson >>>>>>>>>>>>> ([email protected]) and workout something. >>>>>>>>>>>>> >>>>>>>>>>>>> Becky Ligon >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir >>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I made no changes to my environment; it was up and running >>>>>>>>>>>>>> just >>>>>>>>>>>> >>>>>>>>>>>> fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I ran db_recover, and it immediately returned, with no >>>>>>>>>>>>>> apparent sign of doing anything but creating a log.000000001 >>>>>>>>>>>>>> file. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have the centos DB installed, db4-4.3.29-10.el5 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have no backups; this is my high performance filesystem of >>>>>>>>>>>>>> 99TB; >>>>>>>>>>>> >>>>>>>>>>>> it >>>>>>>>>>>>>> >>>>>>>>>>>>>> is the largest disk we have and therefore have no means of >>>>>>>>>>>>>> backing >>>>>>>>>>>> >>>>>>>>>>>> it >>>>>>>>>>>>>> >>>>>>>>>>>>>> up. We don't have anything big enough to hold that much data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there any hope? Can we just identify and delete the files >>>>>>>>>>>>>> that have the db dammange on it? (Note that I don't even have >>>>>>>>>>>>>> anywhere >>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>> >>>>>>>>>>>>>> back up this data to temporally if we do get it running, so >>>>>>>>>>>>>> I'd need to "fix in place". >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks! >>>>>>>>>>>>>> --Jim >>>>>>>>>>>>>> >>>>>>>>>>>>>> --Jim >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon >>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jim: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you haven't made any recent changes to your pvfs >>>>>>>>>>>>>>> environment or Berkeley Db installation, then it looks like >>>>>>>>>>>>>>> you have a corrupted metadata database. >>>>>>>>>>>>>>> There is no way to easily recover. Sometimes, the Berkeley >>>>>>>>>>>>>>> db command "db_recover" might work, but PVFS doesn't have >>>>>>>>>>>>>>> transactions turned on, so normally it doesn't work. It's >>>>>>>>>>>>>>> worth a try, just to be sure. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you have any recent backups of the databases? If so, >>>>>>>>>>>>>>> then you will need to use a set of backups that were created >>>>>>>>>>>>>>> around the same time, so the databases will be somewhat >>>>>>>>>>>>>>> consistent with each other. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Which version of Berkeley are you using? We have had >>>>>>>>>>>>>>> corruption issues with older versions of it. We strongly >>>>>>>>>>>>>>> recommend 4.8 or higher. There are some know problems with >>>>>>>>>>>>>>> threads in the older versions . >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Becky Ligon >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir >>>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi all: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I got some notices from my users with "wierdness with pvfs2" >>>>>>>>>>>>>>>> this morning, and went and investagated. eventually, I >>>>>>>>>>>>>>>> found the following on one of my 3 serers: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version >>>>>>>>>>>>>>>> 2.8.2 starting... >>>>>>>>>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in >>>>>>>>>>>>>>>> dbpf_dspace_iterate_handles(). >>>>>>>>>>>>>>>> [E 03/30 12:23] Warning: skipping entry. >>>>>>>>>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30 >>>>>>>>>>>>>>>> 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid >>>>>>>>>>>> >>>>>>>>>>>> argument >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [E 03/30 12:23] Error adding handle range >>>>>>>>>>>>>>>> 1431655768-2147483649,3579139414-4294967295 to filesystem >>>>>>>>>>>> >>>>>>>>>>>> pvfs2-fs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server >>>>>>>>>>>>>>>> interfaces; aborting. >>>>>>>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------ >>>>>>>>>>>>>>>> pvfs2-fs.conf: >>>>>>>>>>>>>>>> ----------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <Defaults> >>>>>>>>>>>>>>>> UnexpectedRequests 50 >>>>>>>>>>>>>>>> EventLogging none >>>>>>>>>>>>>>>> LogStamp datetime >>>>>>>>>>>>>>>> BMIModules bmi_tcp >>>>>>>>>>>>>>>> FlowModules flowproto_multiqueue >>>>>>>>>>>>>>>> PerfUpdateInterval 1000 >>>>>>>>>>>>>>>> ServerJobBMITimeoutSecs 30 >>>>>>>>>>>>>>>> ServerJobFlowTimeoutSecs 30 >>>>>>>>>>>>>>>> ClientJobBMITimeoutSecs 300 >>>>>>>>>>>>>>>> ClientJobFlowTimeoutSecs 300 >>>>>>>>>>>>>>>> ClientRetryLimit 5 >>>>>>>>>>>>>>>> ClientRetryDelayMilliSecs 2000 >>>>>>>>>>>>>>>> StorageSpace /mnt/pvfs2 >>>>>>>>>>>>>>>> LogFile /var/log/pvfs2-server.log</Defaults> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <Aliases> >>>>>>>>>>>>>>>> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334 >>>>>>>>>>>>>>>> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334 >>>>>>>>>>>>>>>> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334 >>>>>>>>>>>>>>>> </Aliases> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <Filesystem> >>>>>>>>>>>>>>>> Name pvfs2-fs >>>>>>>>>>>>>>>> ID 62659950 >>>>>>>>>>>>>>>> RootHandle 1048576 >>>>>>>>>>>>>>>> <MetaHandleRanges> >>>>>>>>>>>>>>>> Range pvfs2-io-0-0 4-715827885 >>>>>>>>>>>>>>>> Range pvfs2-io-0-1 715827886-1431655767 >>>>>>>>>>>>>>>> Range pvfs2-io-0-2 1431655768-2147483649 >>>>>>>>>>>>>>>> </MetaHandleRanges> >>>>>>>>>>>>>>>> <DataHandleRanges> >>>>>>>>>>>>>>>> Range pvfs2-io-0-0 2147483650-2863311531 >>>>>>>>>>>>>>>> Range pvfs2-io-0-1 2863311532-3579139413 >>>>>>>>>>>>>>>> Range pvfs2-io-0-2 3579139414-4294967295 >>>>>>>>>>>>>>>> </DataHandleRanges> >>>>>>>>>>>>>>>> <StorageHints> >>>>>>>>>>>>>>>> TroveSyncMeta yes >>>>>>>>>>>>>>>> TroveSyncData no >>>>>>>>>>>>>>>> </StorageHints> >>>>>>>>>>>>>>>> </Filesystem> >>>>>>>>>>>>>>>> ------------- >>>>>>>>>>>>>>>> Any suggestions for recovery? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>> --Jim >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Pvfs2-users mailing list >>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u >>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>> e >>>>>>>>>>>>>>>> rs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Becky Ligon >>>>>>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>>>>>>>>>> South Carolina >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Becky Ligon >>>>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>>>>>>>> South Carolina >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Becky Ligon >>>>>>>>>>> OrangeFS Support and Development >>>>>>>>>>> Omnibond Systems >>>>>>>>>>> Anderson, South Carolina >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> _______________________________________________ >>>>> Pvfs2-users mailing list >>>>> [email protected] >>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>> >>>> >>>> _______________________________________________ >>>> Pvfs2-users mailing list >>>> [email protected] >>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
