Quick follow-up: I saved and cleared out the DB directory for the Linux rpm instance, and ran "rpm --db-build"
Output was: > rpm --db-build rpmdb: BUILDING NEW RPM DATABASE FROM SCRATCH (/usr/psr.oit/redhat9/RPM/DB) rpmdb: removing (possibly existing) old RPM database DB files rpmdb: creating new RPM database (built-in RPM procedure) rpmdb: operating on new RPM database rpmdb: rebuilding new RPM database (built-in RPM procedure) rpmdb: making sure RPM database contains all possible DB files rpmdb: rebuilding RPM database (built-in RPM procedure) rpmdb: performing read/write operation on RPM database It looks like all the files are there: > ls -l total 884 -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Basenames -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Conflictname -rw-r--r-- 1 psr psr 8192 Oct 16 17:00 __db.001 -rw-r--r-- 1 psr psr 737280 Oct 16 17:00 __db.003 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.004 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.005 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.006 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.007 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.008 -rw-rw-r-- 1 psr psr 0 Oct 16 17:00 __db.009 -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Depends -rw-rw-r-- 1 psr psr 32768 Oct 16 17:00 Dirnames -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Filemd5s -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Group -rw-r--r-- 1 psr psr 8192 Oct 16 17:00 Installtid -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Name -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Packages -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Providename -rw-r--r-- 1 psr psr 8192 Oct 16 17:00 Provideversion -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Pubkeys -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Requirename -rw-rw-r-- 1 psr psr 32768 Oct 16 17:00 Requireversion -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Sha1header -rw-rw-r-- 1 psr psr 49152 Oct 16 17:00 Sigmd5 -rw-r--r-- 1 psr psr 12288 Oct 16 17:00 Triggername But the program hangs. "ps -ef" after several minutes reveals: > ps -ef|grep rpm psr 20444 12122 0 17:00 pts/3 00:00:00 /usr/psr.oit/redhat9/lib/openpkg/bash /usr/psr.oit/redhat9/lib/openpkg/rpmdb --build psr 20682 20444 0 17:00 pts/3 00:00:00 /usr/psr.oit/redhat9/lib/openpkg/rpmq -q --define _dbpath /usr/psr.oit/redhat9/RPM/DB/ -- gpg-pubkey-63c4cb9f-3c591eda It is apparently the child process that is hung, as I can kill the parent with "kill 20444". Can't kill the child without using "kill -9 20682". Dennis > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Dennis McRitchie > Sent: Thursday, October 16, 2003 4:39 PM > To: [EMAIL PROTECTED] > Subject: Problems running openpkg-20031006-20031006 on Solaris and Linux > (was Two problems building/running openpkg-20031006-20031006 on RedHat > 9) > > > I decide to combine this thread with my other thread about similar problems on > Solaris 9. > > 1) Thanks Ralf for the test program. Our NFS file system is hosted on a Solaris 8 > system. We have two sets of > folders on > that same NFS file system. One is for Solaris 9 programs/files and the other is for > RedHat 9 programs/files. > > Last week, your test program was failing when the file it was trying to create was > on the NFS system whether > I compiled > and ran it from an RH9 or Sol9 system. On both systems I got an ENOLCK. We think > this was a genuine resource > exhaustion > problem, which we are looking into. This week, your test program runs successfully > under the same conditions on both > Linux and Solaris machines. So that variable has been removed, and I thank you for > your help with that. > > 2) Once your program ran, I tried again to run "rpm --db-rebuild" on our RH9 system > (as always the DB files > were on the > NFS system). No complaints this time, and the job was completed with the expected > messages: > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > rpm --db-rebuild > rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/redhat9/RPM/DB) > rpmdb: cleaning up RPM database DB region files > rpmdb: making sure RPM database contains all possible DB files > rpmdb: dumping and reloading RPM database DB file contents > rpmdb: rebuilding RPM database (built-in RPM procedure) > rpmdb: performing read/write operation on RPM database > rpmdb: making sure RPM database files have consistent attributes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > However, when I then ran an "rpm -qa" it printed out what looks like part of your > public key! > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > rpm -qa > gpg-pubkey-63c4cb9f-3c591eda > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Previously, there were 24 packages in the DB. It could be, of course, that the DB > got damaged when I was trying to > rebuild and getting ENOLCKs. But I did try Jeff Johnson's quick fix (below) after a > failed rebuild attempt, and found > that all my packages were still there at that time. > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > %__dbi_cdb create cdb mpool mp_mmapsize=16Mb mp_size=1Mb private > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > However, I also tried a --db-cleanup, so... In any event, I thought I should report > it. Also, how can I rebuild the DB > so it has all the packages listed again? I just tried --db-build and --db-cleanup > but that didn't change anything. > > 3) When I tried to tun "rpm --db-rebuild" on our Solaris 9 system, (as alaways the > DB files were on the NFS system), I > got the same problem as before: EAGAIN on the mmap calls. (Yet your test program > still runs successfully.) > Stderr output > is: > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > rpm --version > OpenPKG RPM 4.2.1 > > rpm --db-rebuild > rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/solaris9/RPM/DB) > rpmdb: cleaning up RPM database DB region files > rpmdb: making sure RPM database contains all possible DB files > rpmdb: dumping and reloading RPM database DB file contents > rpmdb: rebuilding RPM database (built-in RPM procedure) > rpmdb: mmap: Resource temporarily unavailable > error: db4 error(11) from dbenv->open: Resource temporarily unavailable > error: cannot open Packages index > rpmdb: performing read/write operation on RPM database > rpmdb: mmap: Resource temporarily unavailable > error: db4 error(11) from dbenv->open: Resource temporarily unavailable > error: cannot open Packages index using db3 - Resource temporarily unavailable (11) > error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB > rpmdb: mmap: Resource temporarily unavailable > error: db4 error(11) from dbenv->open: Resource temporarily unavailable > error: cannot open Packages index using db3 - Resource temporarily unavailable (11) > error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB > rpmdb: mmap: Resource temporarily unavailable > error: db4 error(11) from dbenv->open: Resource temporarily unavailable > error: cannot open Packages index using db3 - Resource temporarily unavailable (11) > error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB > error: /usr/psr.oit/solaris9/etc/openpkg/openpkg.pgp: import failed. > rpmdb: making sure RPM database files have consistent attributes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Sample truss -f output is: > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > 9887: getuid() = 44976 [44976] > 9887: getgid() = 20110 [20110] > 9887: stat64("/", 0xFFBFF6A8) = 0 > 9887: stat64("/usr/", 0xFFBFF6A8) = 0 > 9887: stat64("/usr/psr.oit/", 0xFFBFF6A8) = 0 > 9887: stat64("/usr/psr.oit/solaris9/", 0xFFBFF6A8) = 0 > 9887: stat64("/usr/psr.oit/solaris9/RPM/", 0xFFBFF6A8) = 0 > 9887: stat64("/usr/psr.oit/solaris9/RPM/DB", 0xFFBFF6A8) = 0 > 9887: access("/usr/psr.oit/solaris9/RPM/DB", 2) = 0 > 9887: stat64("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0xFFBFF7D0) = 0 > 9887: access("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0) = 0 > 9887: access("/usr/psr.oit/solaris9/RPM/DB/Packages", 0) = 0 > 9887: stat("/usr/psr.oit/solaris9/RPM/DB/DB_CONFIG", 0xFFBFF3D8) Err#2 ENOENT > 9887: open("/usr/psr.oit/solaris9/RPM/DB/DB_CONFIG", O_RDONLY) Err#2 ENOENT > 9887: stat("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0xFFBFF458) = 0 > 9887: open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR|O_CREAT|O_EXCL, 0644) > Err#17 EEXIST > 9887: open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR) = 3 > 9887: fcntl(3, F_SETFD, 0x00000001) = 0 > 9887: ioctl(3, 0x2000664C, 0x00000001) = 0 > 9887: fstat(3, 0xFFBFF4D0) = 0 > 9887: open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR|O_CREAT, 0644) = 4 > 9887: fcntl(4, F_SETFD, 0x00000001) = 0 > 9887: ioctl(4, 0x2000664C, 0x00000001) = 0 > 9887: lseek(4, 0, SEEK_END) = 0 > 9887: lseek(4, 0, SEEK_CUR) = 0 > 9887: write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192) = 8192 > 9887: mmap(0x00000000, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) Err#11 EAGAIN > 9887: fstat64(2, 0xFFBFE488) = 0 > 9887: write(2, " r p m d b", 5) = 5 > 9887: write(2, " : ", 2) = 2 > 9887: write(2, " m m a p : ", 6) = 6 > 9887: write(2, " R e s o u r c e t e m".., 32) = 32 > 9887: write(2, "\n", 1) = 1 > 9887: close(4) = 0 > 9887: close(3) = 0 > 9887: write(2, " e r r o r : ", 7) = 7 > 9887: write(2, " d b 4 e r r o r ( 1 1".., 65) = 65 > 9887: write(2, " e r r o r : ", 7) = 7 > 9887: write(2, " c a n n o t o p e n ".., 27) = 27 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Any idea why this might still be happening even though your test program seems to > suggest that locking is working? > > Thanks, > Dennis > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] Behalf Of Ralf S. Engelschall > > Sent: Wednesday, October 08, 2003 3:35 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Two problems building/running openpkg-20031006-20031006 on > > RedHat 9 > > > > > > On Wed, Oct 08, 2003, Dennis McRitchie wrote: > > > > > [...] > > > 1) Problem building if beecrypt-devel-2.2.0-8 package is installed on RedHat 9: > > > If this package (which is > > distributed > > > with RedHat 9) is installed on a system where openpkg-20031006-20031006 is to be > > > built, the rpm configure > > script detects > > > its presence: > > > > > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > > checking beecrypt/beecrypt.h usability... yes > > > checking beecrypt/beecrypt.h presence... yes > > > checking for beecrypt/beecrypt.h... yes > > > checking for mpfprintln in -lbeecrypt... yes > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > This causes it to set "WITH_BEECRYPT_INCLUDE = /usr/include/beecrypt" and this > > > in turn causes the rpmio > Makefile to > > > create the following command line: > > > > > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > > /usr/psr.oit/redhat9/bin/gcc -DHAVE_CONFIG_H -I. -I. -I.. -I. -I.. > > > -I/usr/include/beecrypt -I../popt > > -I/usr/psr.oit/redh > > > at9/include -DOPENPKG -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/zlib-1.1.4 > > -I/usr/psr.oit/redhat9/RPM/TMP/openpkg- > > > 20031006/bzip2-1.0.2 > > > -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/beecrypt-3.1.0 -DOPENPKG > > -I/usr/psr.oit/redhat9/RPM > > > /TMP/openpkg-20031006/zlib-1.1.4 > > > -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/bzip2-1.0.2 > > -I/usr/psr.oit/redhat9/RPM/ > > > TMP/openpkg-20031006/beecrypt-3.1.0 -O2 -D_GNU_SOURCE -D_REENTRANT -MT digest.lo > > > -MD -MP -MF .deps/digest.Tpo -c > > > digest.c -o digest.o > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > Note that the /usr/include/beecrypt (v2.2.0) is ahead of the within-package > > > beecrypt (v3.1.0) which causes > > the following > > > problems: > > > [...] > > > > Fixed with openpkg-20031008-20031008 (see > > http://cvs.openpkg.org/chngview?cn=12699 for details). > > Thanks for the hint. > > > > > [...] > > > 2) Problem running OpenPKG rpm v4.2.1 on RedHat 9: Once I got past the above > > > build problem, I tried running > > it on two > > > RedHat 9 machines and got the same problem on both: > > > > > > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > > > rpm --version > > > OpenPKG RPM 4.2.1 > > > > rpm --db-rebuild > > > rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/redhat9/RPM/DB) > > > rpmdb: cleaning up RPM database DB region files > > > rpmdb: making sure RPM database contains all possible DB files > > > rpmdb: dumping and reloading RPM database DB file contents > > > rpmdb: rebuilding RPM database (built-in RPM procedure) > > > rpmdb: /usr/psr.oit/redhat9/RPM/DB/__db.001: unable to acquire environment lock: > > > No locks available > > > [...] > > > error: db4 error(37) from dbenv->open: No locks available > > > 25899 stat64("/usr/psr.oit/redhat9/RPM/DB/__db.001", {st_mode=S_IFREG|0664, > > > st_size=0, ...}) = 0 > > > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR|O_CREAT|O_EXCL, 0644) > > > = -1 EEXIST (File exists) > > > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR) = 3 > > > 25899 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > > > 25899 fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 > > > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR|O_CREAT, 0644) = 4 > > > 25899 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 > > > 25899 lseek(4, 0, SEEK_END) = 0 > > > 25899 lseek(4, 0, SEEK_CUR) = 0 > > > 25899 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > > 8192) = 8192 > > > 25899 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x40038000 > > > 25899 close(4) = 0 > > > 25899 fcntl64(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 > > > ENOLCK (No locks available) > > > 25899 write(2, "rpmdb: ", 7) = 7 > > > 25899 write(2, "/usr/psr.oit/redhat9/RPM/DB/__db"..., 92) = 92 > > > 25899 write(2, "\n", 1) = 1 > > > 25899 close(3) = 0 > > > 25899 munmap(0x40038000, 8192) = 0 > > > 25899 unlink("/usr/psr.oit/redhat9/RPM/DB/__db.001") = 0 > > > 25899 brk(0) = 0x818e000 > > > 25899 brk(0x8190000) = 0x8190000 > > > 25899 write(2, "error: ", 7) = 7 > > > 25899 write(2, "db4 error(37) from dbenv->open: "..., 51) = 51 > > > 25899 write(2, "error: ", 7) = 7 > > > 25899 write(2, "cannot open Packages index\n", 27) = 27 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > Here again, $prefix is on an NFS-mounted file system. > > > > > > Any thoughts as to what might be going wrong? > > > > This error happens just once within RPM/DB: > > > > | if (!F_ISSET(&renv->mutex, MUTEX_IGNORE) && > > | (ret = __db_mutex_lock(dbenv, &renv->mutex)) != 0) { > > | __db_err(dbenv, "%s: unable to acquire environment lock: %s", > > | infop->name, db_strerror(ret)); > > | goto err; > > | } > > > > And the __db_mutex_lock() internally maps into __db_fcntl_mutex_lock() > > which returns the ENOLCK ("No locks available") error only for the > > failing fcntl(2) calls. So, it is really a classical locking problem > > inside Berkeley-DB here and not related to RPM at all. > > > > Remains the question why the fcntl(2) calls fail with ENOLCK. > > Can you run the following program once while staying inside your > > /usr/psr.oit/redhat9/RPM/DB/ directory and once while staying on a local > > filesystem? > > > > ----------------------------------------------------- > > #include <stdlib.h> > > #include <stdio.h> > > #include <unistd.h> > > #include <fcntl.h> > > > > int main(int argc, char *argv[]) > > { > > int fd; > > struct flock l; > > int rv; > > > > fd = open("fuck2.db", O_RDWR|O_CREAT, 0644); > > l.l_type = F_WRLCK; > > l.l_whence = SEEK_SET; > > l.l_start = 0; > > l.l_len = 0; > > rv = fcntl(fd, F_SETLKW, &l); > > printf("rv=%d\n", rv); > > close(fd); > > return; > > } > > ----------------------------------------------------- > > > > I hope it returns rv=-1 on NFS and rv=0 on local filesystem. My FreeBSD > > manpages for fcntl(2) talk about ENOLCK this way: > > > > | [ENOLCK] The argument cmd is F_SETLK or F_SETLKW, and satisfy- > > | ing the lock or unlock request would result in the > > | number of locked regions in the system exceeding a > > | system-imposed limit. > > > > And some Linux manpages also say: > > > > | ENOLCK > > | Too many segment locks open, lock table is full, or a remote locking > > | protocol failed (e.g. locking over NFS). > > > > So, NFS might be definetely the problem. > > > > If I run the above test program on our RedHat 9 box on a NFS filesystem > > mounted from a NetApp filer or a Solaris 8 box, it works fine (rv=0). > > Same on a local filesystem. But if I run it on a NFS filesystem mounted > > from a FreeBSD or Linux box, it fails with rv=-1. The reason is that the > > FreeBSD and Linux boxes do not support the NFS locking. > > > > So, what type is your NFS server? If it is FreeBSD or Linux, try > > a different one and repeat? I'm sure it then will work fine... > > > > Ralf S. Engelschall > > [EMAIL PROTECTED] > > www.engelschall.com > > > > ______________________________________________________________________ > > The OpenPKG Project www.openpkg.org > > User Communication List [EMAIL PROTECTED] > > > > ______________________________________________________________________ > The OpenPKG Project www.openpkg.org > User Communication List [EMAIL PROTECTED] > ______________________________________________________________________ The OpenPKG Project www.openpkg.org User Communication List [EMAIL PROTECTED]
