Problems running openpkg-20031006-20031006 on Solaris and Linux (was Two problems building/running openpkg-20031006-20031006 on RedHat 9)

Dennis McRitchie Thu, 16 Oct 2003 13:37:39 -0700

I decide to combine this thread with my other thread about similar problems on Solaris 
9.


1) Thanks Ralf for the test program. Our NFS file system is hosted on a Solaris 8 
system. We have two sets of folders on
that same NFS file system. One is for Solaris 9 programs/files and the other is for 
RedHat 9 programs/files.

Last week, your test program was failing when the file it was trying to create was on 
the NFS system whether I compiled
and ran it from an RH9 or Sol9 system. On both systems I got an ENOLCK. We think this 
was a genuine resource exhaustion
problem, which we are looking into. This week, your test program runs successfully 
under the same conditions on both
Linux and Solaris machines. So that variable has been removed, and I thank you for 
your help with that.

2) Once your program ran, I tried again to run "rpm --db-rebuild" on our RH9 system 
(as always the DB files were on the
NFS system). No complaints this time, and the job was completed with the expected 
messages:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> rpm --db-rebuild
rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/redhat9/RPM/DB)
rpmdb: cleaning up RPM database DB region files
rpmdb: making sure RPM database contains all possible DB files
rpmdb: dumping and reloading RPM database DB file contents
rpmdb: rebuilding RPM database (built-in RPM procedure)
rpmdb: performing read/write operation on RPM database
rpmdb: making sure RPM database files have consistent attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, when I then ran an "rpm -qa" it printed out what looks like part of your 
public key!

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> rpm -qa
gpg-pubkey-63c4cb9f-3c591eda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, there were 24 packages in the DB. It could be, of course, that the DB got 
damaged when I was trying to
rebuild and getting ENOLCKs. But I did try Jeff Johnson's quick fix (below) after a 
failed rebuild attempt, and found
that all my packages were still there at that time.

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
%__dbi_cdb create cdb mpool mp_mmapsize=16Mb mp_size=1Mb private
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, I also tried a --db-cleanup, so... In any event, I thought I should report 
it. Also, how can I rebuild the DB
so it has all the packages listed again? I just tried --db-build and --db-cleanup but 
that didn't change anything.

3) When I tried to tun "rpm --db-rebuild" on our Solaris 9 system, (as alaways the DB 
files were on the NFS system), I
got the same problem as before: EAGAIN on the mmap calls. (Yet your test program still 
runs successfully.) Stderr output
is:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> rpm --version
OpenPKG RPM 4.2.1
> rpm --db-rebuild
rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/solaris9/RPM/DB)
rpmdb: cleaning up RPM database DB region files
rpmdb: making sure RPM database contains all possible DB files
rpmdb: dumping and reloading RPM database DB file contents
rpmdb: rebuilding RPM database (built-in RPM procedure)
rpmdb: mmap: Resource temporarily unavailable
error: db4 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages index
rpmdb: performing read/write operation on RPM database
rpmdb: mmap: Resource temporarily unavailable
error: db4 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages index using db3 - Resource temporarily unavailable (11)
error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB
rpmdb: mmap: Resource temporarily unavailable
error: db4 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages index using db3 - Resource temporarily unavailable (11)
error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB
rpmdb: mmap: Resource temporarily unavailable
error: db4 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages index using db3 - Resource temporarily unavailable (11)
error: cannot open Packages database in /usr/psr.oit/solaris9/RPM/DB
error: /usr/psr.oit/solaris9/etc/openpkg/openpkg.pgp: import failed.
rpmdb: making sure RPM database files have consistent attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sample truss -f output is:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
9887:   getuid()                                        = 44976 [44976]
9887:   getgid()                                        = 20110 [20110]
9887:   stat64("/", 0xFFBFF6A8)                         = 0
9887:   stat64("/usr/", 0xFFBFF6A8)                     = 0
9887:   stat64("/usr/psr.oit/", 0xFFBFF6A8)             = 0
9887:   stat64("/usr/psr.oit/solaris9/", 0xFFBFF6A8)    = 0
9887:   stat64("/usr/psr.oit/solaris9/RPM/", 0xFFBFF6A8) = 0
9887:   stat64("/usr/psr.oit/solaris9/RPM/DB", 0xFFBFF6A8) = 0
9887:   access("/usr/psr.oit/solaris9/RPM/DB", 2)       = 0
9887:   stat64("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0xFFBFF7D0) = 0
9887:   access("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0) = 0
9887:   access("/usr/psr.oit/solaris9/RPM/DB/Packages", 0) = 0
9887:   stat("/usr/psr.oit/solaris9/RPM/DB/DB_CONFIG", 0xFFBFF3D8) Err#2 ENOENT
9887:   open("/usr/psr.oit/solaris9/RPM/DB/DB_CONFIG", O_RDONLY) Err#2 ENOENT
9887:   stat("/usr/psr.oit/solaris9/RPM/DB/__db.001", 0xFFBFF458) = 0
9887:   open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR|O_CREAT|O_EXCL, 0644) 
Err#17 EEXIST
9887:   open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR) = 3
9887:   fcntl(3, F_SETFD, 0x00000001)                   = 0
9887:   ioctl(3, 0x2000664C, 0x00000001)                = 0
9887:   fstat(3, 0xFFBFF4D0)                            = 0
9887:   open("/usr/psr.oit/solaris9/RPM/DB/__db.001", O_RDWR|O_CREAT, 0644) = 4
9887:   fcntl(4, F_SETFD, 0x00000001)                   = 0
9887:   ioctl(4, 0x2000664C, 0x00000001)                = 0
9887:   lseek(4, 0, SEEK_END)                           = 0
9887:   lseek(4, 0, SEEK_CUR)                           = 0
9887:   write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192)    = 8192
9887:   mmap(0x00000000, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) Err#11 EAGAIN
9887:   fstat64(2, 0xFFBFE488)                          = 0
9887:   write(2, " r p m d b", 5)                       = 5
9887:   write(2, " :  ", 2)                             = 2
9887:   write(2, " m m a p :  ", 6)                     = 6
9887:   write(2, " R e s o u r c e   t e m".., 32)      = 32
9887:   write(2, "\n", 1)                               = 1
9887:   close(4)                                        = 0
9887:   close(3)                                        = 0
9887:   write(2, " e r r o r :  ", 7)                   = 7
9887:   write(2, " d b 4   e r r o r ( 1 1".., 65)      = 65
9887:   write(2, " e r r o r :  ", 7)                   = 7
9887:   write(2, " c a n n o t   o p e n  ".., 27)      = 27
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Any idea why this might still be happening even though your test program seems to 
suggest that locking is working?

Thanks,
       Dennis

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Ralf S. Engelschall
> Sent: Wednesday, October 08, 2003 3:35 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Two problems building/running openpkg-20031006-20031006 on
> RedHat 9
>
>
> On Wed, Oct 08, 2003, Dennis McRitchie wrote:
>
> > [...]
> > 1) Problem building if beecrypt-devel-2.2.0-8 package is installed on RedHat 9: If 
> > this package (which is
> distributed
> > with RedHat 9) is installed on a system where openpkg-20031006-20031006 is to be 
> > built, the rpm configure
> script detects
> > its presence:
> >
> > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> > checking beecrypt/beecrypt.h usability... yes
> > checking beecrypt/beecrypt.h presence... yes
> > checking for beecrypt/beecrypt.h... yes
> > checking for mpfprintln in -lbeecrypt... yes
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > This causes it to set "WITH_BEECRYPT_INCLUDE = /usr/include/beecrypt" and this in 
> > turn causes the rpmio Makefile to
> > create the following command line:
> >
> > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> > /usr/psr.oit/redhat9/bin/gcc -DHAVE_CONFIG_H -I. -I. -I.. -I. -I.. 
> > -I/usr/include/beecrypt -I../popt
> -I/usr/psr.oit/redh
> > at9/include -DOPENPKG -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/zlib-1.1.4
> -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-
> > 20031006/bzip2-1.0.2 
> > -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/beecrypt-3.1.0 -DOPENPKG
> -I/usr/psr.oit/redhat9/RPM
> > /TMP/openpkg-20031006/zlib-1.1.4 
> > -I/usr/psr.oit/redhat9/RPM/TMP/openpkg-20031006/bzip2-1.0.2
> -I/usr/psr.oit/redhat9/RPM/
> > TMP/openpkg-20031006/beecrypt-3.1.0 -O2 -D_GNU_SOURCE -D_REENTRANT -MT digest.lo 
> > -MD -MP -MF .deps/digest.Tpo -c
> > digest.c -o digest.o
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > Note that the /usr/include/beecrypt (v2.2.0) is ahead of the within-package 
> > beecrypt (v3.1.0) which causes
> the following
> > problems:
> > [...]
>
> Fixed with openpkg-20031008-20031008 (see
> http://cvs.openpkg.org/chngview?cn=12699 for details).
> Thanks for the hint.
>
> > [...]
> > 2) Problem running OpenPKG rpm v4.2.1 on RedHat 9: Once I got past the above build 
> > problem, I tried running
> it on two
> > RedHat 9 machines and got the same problem on both:
> >
> > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> > > rpm --version
> > OpenPKG RPM 4.2.1
> > > rpm --db-rebuild
> > rpmdb: REBUILDING NEW FROM OLD RPM DATABASE (/usr/psr.oit/redhat9/RPM/DB)
> > rpmdb: cleaning up RPM database DB region files
> > rpmdb: making sure RPM database contains all possible DB files
> > rpmdb: dumping and reloading RPM database DB file contents
> > rpmdb: rebuilding RPM database (built-in RPM procedure)
> > rpmdb: /usr/psr.oit/redhat9/RPM/DB/__db.001: unable to acquire environment lock: 
> > No locks available
> > [...]
> > error: db4 error(37) from dbenv->open: No locks available
> > 25899 stat64("/usr/psr.oit/redhat9/RPM/DB/__db.001", {st_mode=S_IFREG|0664, 
> > st_size=0, ...}) = 0
> > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR|O_CREAT|O_EXCL, 0644) = 
> > -1 EEXIST (File exists)
> > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR) = 3
> > 25899 fcntl64(3, F_SETFD, FD_CLOEXEC)   = 0
> > 25899 fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
> > 25899 open("/usr/psr.oit/redhat9/RPM/DB/__db.001", O_RDWR|O_CREAT, 0644) = 4
> > 25899 fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0
> > 25899 lseek(4, 0, SEEK_END)             = 0
> > 25899 lseek(4, 0, SEEK_CUR)             = 0
> > 25899 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) 
> > = 8192
> > 25899 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x40038000
> > 25899 close(4)                          = 0
> > 25899 fcntl64(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 
> > ENOLCK (No locks available)
> > 25899 write(2, "rpmdb: ", 7)            = 7
> > 25899 write(2, "/usr/psr.oit/redhat9/RPM/DB/__db"..., 92) = 92
> > 25899 write(2, "\n", 1)                 = 1
> > 25899 close(3)                          = 0
> > 25899 munmap(0x40038000, 8192)          = 0
> > 25899 unlink("/usr/psr.oit/redhat9/RPM/DB/__db.001") = 0
> > 25899 brk(0)                            = 0x818e000
> > 25899 brk(0x8190000)                    = 0x8190000
> > 25899 write(2, "error: ", 7)            = 7
> > 25899 write(2, "db4 error(37) from dbenv->open: "..., 51) = 51
> > 25899 write(2, "error: ", 7)            = 7
> > 25899 write(2, "cannot open Packages index\n", 27) = 27
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > Here again, $prefix is on an NFS-mounted file system.
> >
> > Any thoughts as to what might be going wrong?
>
> This error happens just once within RPM/DB:
>
> |     if (!F_ISSET(&renv->mutex, MUTEX_IGNORE) &&
> |         (ret = __db_mutex_lock(dbenv, &renv->mutex)) != 0) {
> |         __db_err(dbenv, "%s: unable to acquire environment lock: %s",
> |             infop->name, db_strerror(ret));
> |         goto err;
> |     }
>
> And the __db_mutex_lock() internally maps into __db_fcntl_mutex_lock()
> which returns the ENOLCK ("No locks available") error only for the
> failing fcntl(2) calls. So, it is really a classical locking problem
> inside Berkeley-DB here and not related to RPM at all.
>
> Remains the question why the fcntl(2) calls fail with ENOLCK.
> Can you run the following program once while staying inside your
> /usr/psr.oit/redhat9/RPM/DB/ directory and once while staying on a local
> filesystem?
>
> -----------------------------------------------------
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> int main(int argc, char *argv[])
> {
>     int fd;
>     struct flock l;
>     int rv;
>
>     fd = open("fuck2.db", O_RDWR|O_CREAT, 0644);
>     l.l_type   = F_WRLCK;
>     l.l_whence = SEEK_SET;
>     l.l_start  = 0;
>     l.l_len    = 0;
>     rv = fcntl(fd, F_SETLKW, &l);
>     printf("rv=%d\n", rv);
>     close(fd);
>     return;
> }
> -----------------------------------------------------
>
> I hope it returns rv=-1 on NFS and rv=0 on local filesystem. My FreeBSD
> manpages for fcntl(2) talk about ENOLCK this way:
>
> | [ENOLCK] The argument cmd is F_SETLK or F_SETLKW, and satisfy-
> |          ing the lock or unlock request would result in the
> |          number of locked regions in the system exceeding a
> |          system-imposed limit.
>
> And some Linux manpages also say:
>
> | ENOLCK
> |     Too many segment locks open, lock table is full, or a remote locking
> |     protocol failed (e.g. locking over NFS).
>
> So, NFS might be definetely the problem.
>
> If I run the above test program on our RedHat 9 box on a NFS filesystem
> mounted from a NetApp filer or a Solaris 8 box, it works fine (rv=0).
> Same on a local filesystem. But if I run it on a NFS filesystem mounted
> from a FreeBSD or Linux box, it fails with rv=-1. The reason is that the
> FreeBSD and Linux boxes do not support the NFS locking.
>
> So, what type is your NFS server? If it is FreeBSD or Linux, try
> a different one and repeat? I'm sure it then will work fine...
>
>                                        Ralf S. Engelschall
>                                        [EMAIL PROTECTED]
>                                        www.engelschall.com
>
> ______________________________________________________________________
> The OpenPKG Project                                    www.openpkg.org
> User Communication List                      [EMAIL PROTECTED]
>

______________________________________________________________________
The OpenPKG Project                                    www.openpkg.org
User Communication List                      [EMAIL PROTECTED]

Problems running openpkg-20031006-20031006 on Solaris and Linux (was Two problems building/running openpkg-20031006-20031006 on RedHat 9)

Reply via email to