Re: OpenPKG 4.0.7 RPM lockfiles permission problems solaris 10

2010-09-07 Thread Roman Gaechter
00
00100
01111
00000
00101
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker   Mode  Count Status  - Object ---
 407 dd= 0 locks held 0write locks 0pid/thread 7707/0
 408 dd= 0 locks held 1write locks 0pid/thread 7713/0
 408 READ  1 HELD(2808e 4010004 a200b853 67dd 0) 
handle0

 409 dd= 0 locks held 0write locks 0pid/thread 7713/0
 40a dd= 0 locks held 1write locks 0pid/thread 7713/0
 40a READ  1 HELD(2808f 4010004 a14ac504 1ee7d 0) 
handle0

 40b dd= 0 locks held 1write locks 0pid/thread 7705/0
 40b READ  1 HELD(2808e 4010004 a200b853 67dd 0) 
handle0

 40c dd= 0 locks held 0write locks 0pid/thread 7705/0
 40d dd= 0 locks held 1write locks 0pid/thread 7705/0
 40d READ  1 HELD(2808f 4010004 a14ac504 1ee7d 0) 
handle0

 40e dd= 0 locks held 1write locks 0pid/thread 7707/0
 40e READ  1 HELD(2808e 4010004 a200b853 67dd 0) 
handle0

 40f dd= 0 locks held 0write locks 0pid/thread 7707/0
 410 dd= 0 locks held 1write locks 0pid/thread 7707/0
 410 READ  1 HELD(2808f 4010004 a14ac504 1ee7d 0) 
handle0


.

 3ff dd= 0 locks held 1write locks 0pid/thread 4509/0
 3ff READ  1 HELD(2808e 4010004 a200b853 67dd 0) 
handle0

 400 dd= 0 locks held 0write locks 0pid/thread 4509/0
 401 dd= 0 locks held 1write locks 0pid/thread 4509/0
 401 READ  1 HELD(2808f 4010004 a14ac504 1ee7d 0) handle

.

Any hint out of this output??


Best regards
Roman







On Aug 31, 2010, at 4:23 PM, Roman Gaechter wrote:

The problem is intermittend and we are not able tho reproduce it as 
we like.


Intermittent is what makes it tricky to identify.


At the moment the db_verify returns no error.


So technically no corruption if db_verify doesn't detect any.

rpm -qavv will show the data integrity checks: each installed header 
has a SHA1

digest attached, and you should see confirmation that the SHA1 digest
check passes when -vv is passed.


Maybe the output of db_stat -CA will give a clue?



If a hang is the issue, running db_stat will show what locks are held.

The output you have attached is from a quiescent system afaict.

Try running (as root so that locks can be acquired) rpm -qavv as before.
In the middle, hit ^Z.
then run db_stat -Cl.
You should see a shared/read lock from the running rpm -qa in the 
db_stat output.


The locks usually have a state attached like READ or WRITE or HELD.

And one cause of hangs is a deadlock or a stale lock (though
rpm should detect and clean up stale locks automatically these days).

One way to see stale lock cleanup is to start an rpm install 
operation, and then send kill -9

to rpm to cause immediate program termination.

The next rpm invocation (as root) should have msgs displaying the 
stale lock

cleanup (from the previously terminated by kill -9 rpm install).

That's basically the techniques for diagnosing hangs with RPM and 
Berkeley DB.


hth

73 de Jeff


db_tool db_stat -CA

Default locking region information:
629 Last allocated locker ID
0x7fff  Current maximum unused locker ID
5   Number of lock modes
1000Maximum number of locks possible
1000Maximum number of lockers possible
1000Maximum number of lock objects possible
160 Number of lock object partitions
0   Number of current locks
48  Maximum number of locks at any one time
4   Maximum number of locks in any one bucket
0   Maximum number of locks stolen by for an empty partition
0   Maximum number of locks stolen for any one partition
0   Number of current lockers
12  Maximum number of lockers at any one time
0   Number of current lock objects
34  Maximum number of lock objects at any one time
1   Maximum number of lock objects in any one bucket
0   Maximum number of objects stolen by for an empty partition
0   Maximum number of objects stolen for any one partition
12898   Total number of locks requested
12898   Total number of locks released
90  Total number of locks upgraded
495 Total number of locks downgraded
4   Lock requests not available due to conflicts, for which we waited
0   Lock requests not available due to conflicts, for which we 
did not wait

0   Number of deadlocks
0   Lock timeout value
0   Number of locks that have timed out
0   Transaction timeout value
0   Number of transactions that have timed out
512KB   The size of the lock region
11  The number of partition locks that required waiting (0%)
7   The maximum number of times any partition lock was waited for 
(0%)

0   The number of object queue operations

Re: OpenPKG 4.0.7 RPM lockfiles permission problems solaris 10

2010-08-31 Thread Roman Gaechter

The problem is intermittend and we are not able tho reproduce it as we like.
At the moment the db_verify returns no error.
Maybe the output of db_stat -CA will give a clue?

db_tool db_stat -CA

Default locking region information:
629 Last allocated locker ID
0x7fff  Current maximum unused locker ID
5   Number of lock modes
1000Maximum number of locks possible
1000Maximum number of lockers possible
1000Maximum number of lock objects possible
160 Number of lock object partitions
0   Number of current locks
48  Maximum number of locks at any one time
4   Maximum number of locks in any one bucket
0   Maximum number of locks stolen by for an empty partition
0   Maximum number of locks stolen for any one partition
0   Number of current lockers
12  Maximum number of lockers at any one time
0   Number of current lock objects
34  Maximum number of lock objects at any one time
1   Maximum number of lock objects in any one bucket
0   Maximum number of objects stolen by for an empty partition
0   Maximum number of objects stolen for any one partition
12898   Total number of locks requested
12898   Total number of locks released
90  Total number of locks upgraded
495 Total number of locks downgraded
4   Lock requests not available due to conflicts, for which we waited
0   Lock requests not available due to conflicts, for which we did 
not wait

0   Number of deadlocks
0   Lock timeout value
0   Number of locks that have timed out
0   Transaction timeout value
0   Number of transactions that have timed out
512KB   The size of the lock region
11  The number of partition locks that required waiting (0%)
7   The maximum number of times any partition lock was waited for (0%)
0   The number of object queue operations that required waiting (0%)
3   The number of locker allocations that required waiting (0%)
0   The number of region locks that required waiting (0%)
1   Maximum hash bucket length
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
LockRegion type
4   Region ID
__db.004Region name
0xfed0  Original region address
0xfed0  Region address
0xfed000c8  Region primary address
0   Region maximum allocation
0   Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
Allocations by power-of-two sizes:
  1KB   3002
  2KB   0
  4KB   0
  8KB   1
 16KB   2
 32KB   0
 64KB   1
128KB   0
256KB   0
512KB   0
1024KB  0
REGION_JOIN_OK  Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock region parameters:
283 Lock region region mutex [0/223 0% 23806/0]
1031locker table size
1031object table size
560 obj_off
69560   locker_off
1   need_dd
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock conflict matrix:
0   0   0   0   0
0   0   1   0   0
0   1   1   1   1
0   0   0   0   0
0   0   1   0   1
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker   Mode  Count Status  - Object ---
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker   Mode  Count Status  - Object ---

Regards
Roman


On 08/31/2010 02:20 PM, Jeff Johnson wrote:


On Aug 31, 2010, at 4:23 AM, Roman Gächter wrote:


Hello

We run into serious problems with OpenPKG 4.0.2 and 4.0.7 on solaris 
10 SPARC in solaris whole rooted zones.


After some days of normal operation our rpm db gets corrupted.



(aside)
Corrupted means data loss or integrity failure technically
(although its quite common to hear corruption applied
to Berkeley DB with other meanings.)

Is there data loss or integrity failure? Running db_verify

cd /var/lib/rpm
db_verify Packages

against Packages or other tables (Berkeley DB calls
tables databases) will detect integrity failures.

Data loss can be detected by, say, counting the number
of packages installed like
rpm -qa | wc -l


We see hanging processes from the the openpkg rc facility.
The openpkg rc script tests the rpm db with a openpkg rpm -q 
openpkg query.

These queries hang up.



Hang up how? Can you add -vv to a query? There is also
cd  /var/lib/rpm
db_stat -CA # -- -CA displays all info, -Cl displays lock info

Note that you MUST use db_verify/db_stat for the same version of
Berkeley DB used by RPM. The tools for the same version of Berkeley DB 
if internal

to RPM are released with RPM.


When this happens the permission of the rpm lockfiles RPM/DB/__db.001 
.002 .003 changes to root.



An rpmdb is protected by permissions on /var/lib/rpm.

The associated __db* files are created as needed.

I would expect root as owner typically, because /var/lib/rpm is
usually writable only by root, and rpm installs are typically run as 
root.


But the openpkg wrapper appears