On Mon, 4 Feb 2008, Howard Chu wrote: > That documentation is clearly obsolete, which is why it was removed.
slurpd is obsolete, which is why the section on slurpd was removed from the 2.4 manual. Considering OpenLDAP-2.3.39 is still marked as the stable release, I can't really see that the 2.3 documentation in its entirety is obsolete. > http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/archival.html Ah, that is the section on backing up/restoring a database, which I suppose could also be considered the same procedure to be used for copying a database from one system to another. Given your original wording, I was looking for something more specifically geared towards copying. > At a guess, you failed to copy the transaction log files to the slaves. If I had failed to copy the transaction log files, I don't really see that it would have worked at all; let alone for almost a year. Reviewing the backup/restore procedure, I don't really see anything I might have missed. slapd was not running during the copy, so clearly any updates were suspended. In fact, slapd had never been run -- the copy was made immediately after the initial slapadd. There were actually no log files present. As I mentioned, I have bdb configured to automatically remove them. Presumably slapadd explicitly/implicitly check pointed upon completion and they were removed. Even if there was a log file that I didn't see, the log files were stored in the same directory as the database files, and I copied the entire directory. > > Also, even if for some reason the copies on the two slaves were invalid, > > that would not explain why the master failed. The database on the master > > was the original database built by slapadd when the server was first put > > into commission. How could making a copy of it have caused it to fail > > itself? > > Too difficult to guess, given the lack of information. We have only your > assurance that nothing was done incorrectly, but the facts indicate that at > least one step was done incorrectly. The facts only indicate that I had a catastrophic failure. That the failure was caused by incompetence is only a hypothesis. I do greatly appreciate your response and willingness to help; I apologize if I'm getting a bit defensive. You do have only my assurance that I didn't screw something up. However, assuming I'm not lying, the facts are: * openldap 2.3.35 was initially installed on three servers * on the master server, slapadd was run to load in an existing database in ldif format * the resultant bdb database was then copied to both slaves * all three were put into production March 2007 and ran perfectly under a reasonably heavy load * a week or so ago I upgraded them to 2.3.40 (stop old server, install new server, start new server -- never touching bdb or the existing database files) * they ran fine for at least 3-4 days * this weekend, they died horribly Given these facts, if something was done incorrectly, it does not seem likely that it was failure to copy a transaction log file in March 2007. If the failure was my own doing, it seems more likely a byproduct of the upgrade, although I can't think of anything that I could have done wrong during that process. At this point, I guess I'll just write it off and hope it doesn't happen again. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768
