Thanks Boyd: We have 3 io servers, each also running metadata servers. One will not come up (that's the 3rd server). I did try and run the db check command (forget the specifics), and it did return a single chunk of entries that are not readable. As you may guess from the above, I've never interacted with bdb on a direct or low level. I don't have a good answer for #3; I noticed about 1/3 of the directory entries were "red" on the terminal, and several individuals contacted me with pvfs problems.
I will begin building new versions of bdb. Do I need to install this just on the servers, or do the clients need it as well? --Jim On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson <[email protected]> wrote: > Jim, > We have been discussing your issue internally. A few questions: > 1. How many metadata servers do you have? > 2. Do you know which one is affected (if there is more than one)? > 3. How much of the file system can you currently see? > > The issue you mentioned seems to be the one we have seen with the earlier > versions of BerkeleyDB and we have not seen them with the newer versions as > Becky mentioned. In our discussions we can't recall if we tried doing a low > level BDB access to the MD for the unaffected entries and back them up so > they can be restored in a new BDB. If you are comfortable with lower level > BDB commands you may want to see if you can read the entries up to the > corruption and after, if you can do both, you may be able to write a small > program to read out all the entries into a file or another BDB, then rebuild > the BDB with the valid entries. > > thx > -boyd > > On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon <[email protected]> wrote: >> >> Jim: >> >> I understand your situation. Here at Clemson University, we went through >> the same situation a couple of years ago. Now, we backup the metadata >> databases. We don't have the space to backup our data either! >> >> Under no circumstances should you run pvfs2-fsck. If you do, then we >> won't be able to help at all, if you run this command in the destructive >> mode. If you're willing, Omnibond MAY be able to write some utilities that >> we help you recover most of the data. You will have to speak to Boyd Wilson >> ([email protected]) and workout something. >> >> Becky Ligon >> >> >> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir <[email protected]> wrote: >>> >>> I made no changes to my environment; it was up and running just fine. >>> I ran db_recover, and it immediately returned, with no apparent sign >>> of doing anything but creating a log.000000001 file. >>> >>> I have the centos DB installed, db4-4.3.29-10.el5 >>> >>> I have no backups; this is my high performance filesystem of 99TB; it >>> is the largest disk we have and therefore have no means of backing it >>> up. We don't have anything big enough to hold that much data. >>> >>> Is there any hope? Can we just identify and delete the files that >>> have the db dammange on it? (Note that I don't even have anywhere to >>> back up this data to temporally if we do get it running, so I'd need >>> to "fix in place". >>> >>> thanks! >>> --Jim >>> >>> --Jim >>> >>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon <[email protected]> wrote: >>> > Jim: >>> > >>> > If you haven't made any recent changes to your pvfs environment or >>> > Berkeley >>> > Db installation, then it looks like you have a corrupted metadata >>> > database. >>> > There is no way to easily recover. Sometimes, the Berkeley db command >>> > "db_recover" might work, but PVFS doesn't have transactions turned on, >>> > so >>> > normally it doesn't work. It's worth a try, just to be sure. >>> > >>> > Do you have any recent backups of the databases? If so, then you will >>> > need >>> > to use a set of backups that were created around the same time, so the >>> > databases will be somewhat consistent with each other. >>> > >>> > Which version of Berkeley are you using? We have had corruption issues >>> > with >>> > older versions of it. We strongly recommend 4.8 or higher. There are >>> > some >>> > know problems with threads in the older versions . >>> > >>> > Becky Ligon >>> > >>> > On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir <[email protected]> >>> > wrote: >>> >> >>> >> Hi all: >>> >> >>> >> I got some notices from my users with "wierdness with pvfs2" this >>> >> morning, and went and investagated. eventually, I found the following >>> >> on one of my 3 serers: >>> >> >>> >> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 >>> >> starting... >>> >> [E 03/30 12:23] Warning: got invalid handle or key size in >>> >> dbpf_dspace_iterate_handles(). >>> >> [E 03/30 12:23] Warning: skipping entry. >>> >> [E 03/30 12:23] c_get failed on iteration 3044 >>> >> [E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument >>> >> [E 03/30 12:23] Error adding handle range >>> >> 1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs >>> >> [E 03/30 12:23] Error: Could not initialize server interfaces; >>> >> aborting. >>> >> [E 03/30 12:23] Error: Could not initialize server; aborting. >>> >> >>> >> ------------ >>> >> pvfs2-fs.conf: >>> >> ----------- >>> >> >>> >> <Defaults> >>> >> UnexpectedRequests 50 >>> >> EventLogging none >>> >> LogStamp datetime >>> >> BMIModules bmi_tcp >>> >> FlowModules flowproto_multiqueue >>> >> PerfUpdateInterval 1000 >>> >> ServerJobBMITimeoutSecs 30 >>> >> ServerJobFlowTimeoutSecs 30 >>> >> ClientJobBMITimeoutSecs 300 >>> >> ClientJobFlowTimeoutSecs 300 >>> >> ClientRetryLimit 5 >>> >> ClientRetryDelayMilliSecs 2000 >>> >> StorageSpace /mnt/pvfs2 >>> >> LogFile /var/log/pvfs2-server.log >>> >> </Defaults> >>> >> >>> >> <Aliases> >>> >> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334 >>> >> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334 >>> >> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334 >>> >> </Aliases> >>> >> >>> >> <Filesystem> >>> >> Name pvfs2-fs >>> >> ID 62659950 >>> >> RootHandle 1048576 >>> >> <MetaHandleRanges> >>> >> Range pvfs2-io-0-0 4-715827885 >>> >> Range pvfs2-io-0-1 715827886-1431655767 >>> >> Range pvfs2-io-0-2 1431655768-2147483649 >>> >> </MetaHandleRanges> >>> >> <DataHandleRanges> >>> >> Range pvfs2-io-0-0 2147483650-2863311531 >>> >> Range pvfs2-io-0-1 2863311532-3579139413 >>> >> Range pvfs2-io-0-2 3579139414-4294967295 >>> >> </DataHandleRanges> >>> >> <StorageHints> >>> >> TroveSyncMeta yes >>> >> TroveSyncData no >>> >> </StorageHints> >>> >> </Filesystem> >>> >> ------------- >>> >> Any suggestions for recovery? >>> >> >>> >> Thanks! >>> >> --Jim >>> >> _______________________________________________ >>> >> Pvfs2-users mailing list >>> >> [email protected] >>> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>> > >>> > >>> > >>> > >>> > -- >>> > Becky Ligon >>> > OrangeFS Support and Development >>> > Omnibond Systems >>> > Anderson, South Carolina >>> > >>> > >> >> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> >> >> _______________________________________________ >> Pvfs2-users mailing list >> [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
