> On 10/15/2010 04:33 AM, Dean Rasheed wrote:
>>
>> I started looking at this last night, but ran out of time. I'll
>> continue this evening / over the weekend.

Continuing my review of this patch...

Usability review
----------------

What the patch does:

This patch adds syntax to allow additional enum values to be added to
an enum type, at any position in the list. The syntax has already been
discussed and a broad consensus reached. To me the new syntax seems
very logical and easy to use/remember.

Do we want that?

Yes, I think so. I can think of a few past projects where I could have
used this. A possible future extension would be the ability to remove
enum values, but that is likely to have additional complications, and
I think the number of use cases for it is smaller. I see no reason to
insist that it be part of this patch.

Do we already have it?

No.

Does it follow SQL spec, or the community-agreed behavior?

Not in the SQL spec, but it does follow agreed behaviour.

Does it include pg_dump support (if applicable)?

Yes.

Are there dangers?

None that I can think of.

Have all the bases been covered?

I just noticed that a couple of columns have been added to pg_type, so
there's another bit documentation that needs updating.

Feature test
------------

Does the feature work as advertised?

Yes.

Are there corner cases the author has failed to consider?

None that I could find.

Are there any assertion failures or crashes?

Yes, I'm afraid I managed to provoke a crash by running the regression
tests with -DCATCACHE_FORCE_RELEASE, after spotting a suspicious bit
of code (see below).

Performance review
------------------

Does the patch slow down simple tests?

There is a documented performance penalty when working with enum
values that are not in OID order. To attempt to measure this I created
2 tables, foo and bar, each with 800,000 rows. foo had an enum column
using the planets enum from the regression test, with values not in
OID order. bar had a similar enum column, but with values in the
default OID order.

Performance differences for comparison-heavy operations are most noticable:

SELECT MAX(p) FROM foo;
   max
---------
 neptune
(1 row)

Time: 132.305 ms

SELECT MAX(p) FROM bar;
   max
---------
 neptune
(1 row)

Time: 93.313 ms


SELECT p FROM foo ORDER BY p OFFSET 500000 LIMIT 1;
   p
--------
 saturn
(1 row)

Time: 1516.725 ms

SELECT p FROM bar ORDER BY p OFFSET 500000 LIMIT 1;
   p
--------
 saturn
(1 row)

Time: 1043.010 ms

(optimised build, asserts off)

That's a bigger performance hit than a I would have expected, even
though it is documented.

enum_ccmp() is using bsearch(), so there's a fair amount of function
call overhead there, and perhaps for small enums, a simple linear scan
would be faster.  I'm not sure to what extent this is worth worrying
about.

Code review
-----------

I'm not familar enough with the pg_upgrade code to really comment on it.

Looking at AddEnumLabel() I found it a bit hard to follow, but near
the end of the block used when BEFORE/AFTER is specified, it does
this:

        ReleaseCatCacheList(list);

        /* are the labels sorted by OID? */
        if (result && newelemorder > 1)
                result = newOid > HeapTupleGetOid(existing[newelemorder-2]);
        if (result && newelemorder < nelems + 1)
                result = newOid < HeapTupleGetOid(existing[newelemorder-1]);

It looks to me as though 'existing[...]' is a pointer into a member of
the list that's just been freed, so it risks reading freed memory.
That seems to be confirmed by running the tests with
-DCATCACHE_FORCE_RELEASE. Doing so causes a number of the tests to
fail/crash, but I didn't dig any deeper to confirm that this was the
cause.

For the most part, this patch looks good, but I think there is still a
bit of tidying up to do.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to