[HACKERS] Re: New data type: uniqueidentifier

2001-07-02 Thread Thomas Swan



Peter Eisentraut wrote:

  Dmitry G. Mastrukov writes:
  
I've developed new data type for PostgreSQL -unique identifier - 128-bitvalue claims to be unique across Universe. It depends on libuuid frome2fsprogs by Theodore Ts'o.

ISTM that this should be a function, not a data type.

I'd second the function idea: function uuid( ) returns an int8 value; don't
create a bazillion datatypes. Besides, 128 bit numbers are 7 byte integers.
 PostgreSQL has an int8 (8 byte integer) datatype. While I like the UUID
function idea, I'd recommend a better solution to creating an "unique" identifier.
Why not create a serial8 datatype: int8 with an int8 sequence = 256bit "unique"
number. {Yes, I know I'm violating my first sentence.} Then, you'd have
the same thing (or better) AND your not relying on randomness. 





Re: [HACKERS] shared library strangeness?

2001-07-02 Thread Bill Studenmund

On Tue, 22 May 2001, Bruce Momjian wrote:

 I am always confused when to bump the minor and when the major.  I also
 was not sure how significant the change would be for apps.  We added
 const, and I changed the return type of one function from short to int. 
 Seems like ConnectionBad was also changed.

Sorry for the delay.

You need to bump the minor whenever you add to the library. You need to
bump the major whenever you delete from the library or change(*) the
interface to a function. i.e. if a program links against the library, as
long as the routine names it linked against behave as it expected at
compile time, you don't need to bump the major.

(*) NetBSD (and I think other OSs too) use a gcc-ism, RENAME, to be able
to change the interface seen by new programs w/o changing the minor
number. What you do is prototype the function as you want it now, and have
a __RENAME(new_name) at the end of the prototype. When you build the
library, you have a routine having the old footprint and old name, and a
new routine with the new footprint and named new_name. Old programs look
for the old name, and get what they expect. New programs look for the new
name, and also get what they expect.

I'm not sure if Postgres needs to go to that much trouble.

Take care,

Bill


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] Buffer access rules, and a probable bug

2001-07-02 Thread Tom Lane

I have been making some notes about the rules for accessing shared disk
buffers, since they aren't spelled out anywhere now AFAIK.  In process
I found what seems to be a nasty bug in the code that tries to build
btree indexes that include already-dead tuples.  (If memory serves,
Hiroshi added that code awhile back to help suppress the heap tuples
!= index tuples complaint from VACUUM.)

Would people look this over and see if they agree with my deductions?

regards, tom lane


Notes about shared buffer access rules
--

There are two separate access control mechanisms for shared disk buffers:
reference counts (a/k/a pin counts) and buffer locks.  (Actually, there's
a third level of access control: one must hold the appropriate kind of
lock on a relation before one can legally access any page belonging to
the relation.  Relation-level locks are not discussed here.)

Pins: one must hold a pin on a buffer (increment its reference count)
before being allowed to do anything at all with it.  An unpinned buffer is
subject to being reclaimed and reused for a different page at any instant,
so touching it is unsafe.  Typically a pin is acquired by ReadBuffer and
released by WriteBuffer (if one modified the page) or ReleaseBuffer (if not).
It is OK and indeed common for a single backend to pin a page more than
once concurrently; the buffer manager handles this efficiently.  It is
considered OK to hold a pin for long intervals --- for example, sequential
scans hold a pin on the current page until done processing all the tuples
on the page, which could be quite a while if the scan is the outer scan of
a join.  Similarly, btree index scans hold a pin on the current index
page.  This is OK because there is actually no operation that waits for a
page's pin count to drop to zero.  (Anything that might need to do such a
wait is instead handled by waiting to obtain the relation-level lock,
which is why you'd better hold one first.)  Pins may not be held across
transaction boundaries, however.

Buffer locks: there are two kinds of buffer locks, shared and exclusive,
which act just as you'd expect: multiple backends can hold shared locks
on the same buffer, but an exclusive lock prevents anyone else from
holding either shared or exclusive lock.  (These can alternatively be
called READ and WRITE locks.)  These locks are relatively short term:
they should not be held for long.  They are implemented as per-buffer
spinlocks, so another backend trying to acquire a competing lock will
spin as long as you hold yours!  Buffer locks are acquired and released by
LockBuffer().  It will *not* work for a single backend to try to acquire
multiple locks on the same buffer.  One must pin a buffer before trying
to lock it.

Buffer access rules:

1. To scan a page for tuples, one must hold a pin and either shared or
exclusive lock.  To examine the commit status (XIDs and status bits) of
a tuple in a shared buffer, one must likewise hold a pin and either shared
or exclusive lock.

2. Once one has determined that a tuple is interesting (visible to the
current transaction) one may drop the buffer lock, yet continue to access
the tuple's data for as long as one holds the buffer pin.  This is what is
typically done by heap scans, since the tuple returned by heap_fetch
contains a pointer to tuple data in the shared buffer.  Therefore the
tuple cannot go away while the pin is held (see rule #5).  Its state could
change, but that is assumed not to matter after the initial determination
of visibility is made.

3. To add a tuple or change the xmin/xmax fields of an existing tuple,
one must hold a pin and an exclusive lock on the containing buffer.
This ensures that no one else might see a partially-updated state of the
tuple.

4. It is considered OK to update tuple commit status bits (ie, OR the
values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
pin on a buffer.  This is OK because another backend looking at the tuple
at about the same time would OR the same bits into the field, so there
is little or no risk of conflicting update; what's more, if there did
manage to be a conflict it would merely mean that one bit-update would
be lost and need to be done again later.

5. To physically remove a tuple or compact free space on a page, one
must hold a pin and an exclusive lock, *and* observe while holding the
exclusive lock that the buffer's shared reference count is one (ie,
no other backend holds a pin).  If these conditions are met then no other
backend can perform a page scan until the exclusive lock is dropped, and
no other backend can be holding a reference to an existing tuple that it
might expect to examine again.  Note that another backend might pin the
buffer (increment the refcount) while one is performing the cleanup, but
it won't be able to to actually examine the page until it acquires shared
or 

Re: [HACKERS] selecting from cursor

2001-07-02 Thread Tom Lane

Alex Pilosov [EMAIL PROTECTED] writes:
 I'm done with change of RangeTblEntry into three different node types:
 RangeTblEntryRelation,RangeTblEntrySubSelect,RangeTblEntryPortal which
 have different fields. All the existing places instead of using
 rte-subquery to determine type now use IsA(rte, RangeTblEntrySubSelect),
 and later access fields after casting ((RangeTblEntrySubSelect *)rte)-xxx

 Some functions that always work on Relation RTEs are declared to accept
 RangeTblEntryRelation. Asserts are added everywhere before casting of RTE
 into specific type. (Unless there was an IsA before, then I didn't put an
 Assert).

 Let me know if that is an acceptable way of doing things, or casting makes
 things too ugly. (I believe its the best way, unions are more dangerous
 in this context).

And what are you doing with the places that don't care which kind of RTE
they are dealing with (which is most of them IIRC)?  While you haven't
shown us the proposed changes, I really suspect that a union would be
cleaner, because it'd avoid ugliness in those places.  Bear in mind that
the three RTE types that you have are going to become five or six real
soon now, because I have other things to fix that need to be done that
way --- so the notational advantage of a union is going to increase.

 ... you cannot ReScan a portal. 

That's gonna have to be fixed.  If you're not up for it, don't implement
this.  Given that cursors (are supposed to) support FETCH BACKWARDS,
I really don't see why they shouldn't be expected to handle ReScan...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Re: New data type: uniqueidentifier

2001-07-02 Thread Thomas Swan



I sit corrected. 

*slightly humbled*

Why not do an unsigned int16 to hold your UUID generated numbers.  Ultimately,
this would seem to be a more general solution and accomplish your goals at
the sametime. Or, am I completely missing something.

Christopher Kings-Lynne wrote:

  
don't create a bazillion datatypes.  Besides, 128 bit numbers are 7byte integers.

Hang on:  128 div 8 = 16 byte integer

  PostgreSQL has an int8 (8 byte integer) datatype.
  
  And therefore it is a _64_ bit integer and you can't have a 256bit uniquenumber in it...
  
While I like the UUID function idea, I'd recommend a better solution tocreating an "unique" identifier.  Why not create a serial8 datatype:int8 with an int8 sequence = 256bit "unique" number.  {Yes, I knowviolating my first sentence.}  Then, you'd have the same thing (orbetter) AND your not relying on randomness.

Chris