Collisions and retries are normal.  If they did not happen, there would be
no point in having semaphores.  The important question here is how long had
the system been up before this semaphore listing was taken?  A small, slowly
increasing collision count is no problem.  A large, rapidly climbing value
is symptomatic of trouble.  The ratio of the retry count to the collision
count gives an indication of how long the user had to wait for the semaphore
but is highly unreliable because of the varying implementations of
semaphores in different operating systems.

It is intersting that the T30 file semaphore was owned at the time you did
this report.  This tends to suggest (unless it was a fluke) that this
semaphore is being held for excessive periods.

I suspect that the problem here may not be to do with large dynamic files at
all but with small dynamic files (unless the files are extremely dynamic in
content) combined with record locking.  Here is my reasoning.  If someone in
the UV development group would like to deny any of this and put me right I
would be very grateful as I teach the UniVerse internals course for IBM here
in the UK and would hate to be talking rubbish! (I have never seen any of
the source code involved.  All "facts" are based on public domain
documentation or experiment).

Dynamic files change their modulo value automatically to respond to the
volume of data stored in them. Because this is based on the percentage of
the file space that is used, a small dynamic file tends to split or merge
more frequently than a large dynamic file.  Indeed, a very large dynamic
file may only split every few thousand writes so the "cost" of dynamic files
becomes negligible.

The process of splitting or merging involves moving data around between two
(and only two) groups such that some records may end up being moved.  So
far, so good.  The potential problem comes with locked records.

The UniVerse locking system uses a two dimensional lock table in which the
row on which a lock appears is determined mathematically and is based on the
group number, modulo and inode number of the file.  If a spilt or merge
affects a group in which there are locked records, the lock table entry has
to be moved.  This is a non-trivial task and, because the shared memory T30
file structure that holds the file parameters would need to remain locked
across this operation, could result in collisions as other users try to
update the table.  To make things worse, UniVerse's use of informational
group locks as a (very effective) performance boost to most locking
operations means that it is necessary to update both the record lock and
group lock tables.

One experiment I have never done, and keep thinking I must find the time to
do, is to find out what happens if a file splits or merges but there is a
lock present that needs to move to a row of the lock table that is already
full.  Hopefully, it does not just hang on to the T30File semaphore until a
space becomes available.

Perhaps, next time this problem occurs you should do "analyze.shm -sr" to
show the semaphore table and the record/group lock tables.


Martin Phillips
Ladybridge Systems
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB
+44-(0)1604-709200
-------
u2-users mailing list
[EMAIL PROTECTED]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to