Re: [HACKERS] Solving the OID-collision problem

mark Thu, 04 Aug 2005 11:14:53 -0700

On Thu, Aug 04, 2005 at 12:20:24PM -0400, Tom Lane wrote:
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
> >> I'm too lazy to run an experiment, but I believe it would.  Datum is
> >> involved in almost every function-call API in the backend. In
> >> particular this means that it would affect performance-critical code
> >> paths.
> > I hear you on the "lazy" part, but if OID becomes a structure, then you
> > are still comparing a native type until you get a match, then you make one
> > more comparison to confirm it is the right one, or move on.
> No, you're missing the point entirely: on 32-bit architectures, passing
> a 32-bit integral type to a function is an extremely well optimized
> operation, as is returning a 32-bit integral type.  Passing or
> returning a 64-bit struct is, um, not so well optimized.


I don't think this is necessarily true. For example, instead of passing
the 32-bit integer around, you would instead be passing a 32-bit pointer
to a data structure. This doesn't have to be expensive - although,
depending on the state of the API, it may require extensive changes to
make it inexpensive (or not - I don't know).

>From my perspective (new to this list - could be good, or could be bad)
the concept of the OID was too generalized. As a generalization, it
appears to have originally been intended to uniquely identify every
row in the database (system tables and user tables). As a generalization,
32-bits was not enough to represent every row in the database. It was a
mistake.

The work-around for this mistake, was to allow user tables to be
specially defined to not unnecessarily steal range from the OID space.
This work-around proved to be desirable enough, that as of PostgreSQL 8,
tables are no longer created with OIDs by default. It's still a
work-around. What has been purchased with this work-around is time to
properly address this problem. The problem has not been solved.

I see a few ways to solve this:

    1) Create OID domains. The system tables could have their own OID
       counter separate from the user table OID counters. Tables that
       have no relationship to each other would be put in their own
       OID domain. It isn't as if you can map from row OID to table
       anyways, so any use of OID assumes knowledge of the table
       relationships. I see this as being relatively cheap to implement,
       with no impact on backwards compatibility, except in unusual cases
       where people have seriously abused the concept of an OID. This
       is another delay tactic, in that a sufficient number of changes
       to the system tables would still cause a wrap-around, however,
       it is equivalent or better to the suggestion that all user tables
       be created without oids, as this at least allows user tables to
       use oids again.

    2) Enlarge the OID to be 64-bit or 128-bit. I don't see this as a
       necessarily being a performance problem, however, it might require
       significant changes to the API, which would be expensive. It might
       be argued that enlarging the OID merely delays the problem, and
       doesn't actually address it. Perhaps delaying it by 2^32 is
       effectively indefinately delaying it, or perhaps not. Those who
       thought 32-bits would be enough, or those who thought 2 digit years
       would be enough, under-estimated the problem. Compatibility can
       be mostly maintained, although the databases would probably need
       to be upgraded, and applications that assumed that the OID could
       fit into a 32-bit integer would break.

    3) Leave OIDs as the general database-wide row identifier, and don't
       use OIDs to identifier system metadata. Instead, use a UUID (128-bit)
       or similar. System tables are special. Why shouldn't they have a
       non-general means of identifying stored metadata? This has some
       of the benefits of 1, all of the costs of 2, and it additional
       breaks compatibility for everything.

Based on my suggestions above, I see 1) as the best short and medium
term route. How hard would it be? Instead of a database wide OID
counter, we have several OID counters, with the table having an OID
counter association. Assuming the OID domain is properly defined, all
existing code continues to function properly, and wrap-around of the
OID in one domain, doesn't break the other domains, such as the system
tables.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]     
__________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Solving the OID-collision problem

Reply via email to