[HACKERS] regproc's lack of certainty is dangerous

Tom Lane Wed, 12 Mar 2003 12:49:54 -0800

Deepak Bhole of Red Hat just pointed out to me a failure that he got
after some 7.3 stress testing:


> [ "[EMAIL PROTECTED]&*()''| \final_db\n,.;:'" ]=# SELECT n.nspname, p.proname,
> o.oprcode::oid FROM pg_operator o, pg_proc p, pg_namespace n WHERE
> o.oid=270447::oid AND p.oid=o.oprcode::oid AND p.pronamespace=n.oid;

> ERROR:  There is more than one procedure named "[ ""[EMAIL PROTECTED]&*()''|
> \final_schema\n,.;:'"" ]"."[ ""[EMAIL PROTECTED]&*''| \ {func_for_op}\n,.;:'"" ]"

This error comes out of regprocin() when it finds multiple candidate
functions with the same name (and, presumably, different argument lists
or different schemas).  No big surprise, since that's what he had.
But it's a bit odd that regprocin() is being invoked, when there's no
regproc literal in the given query.

After some digging, it turns out that the error is appearing because
that function name is present in the pg_statistic entry for
pg_operator.oprcode.  *Any* query that causes the optimizer to become
interested in pg_operator.oprcode will fail under these circumstances
:-(.  And the user can't readily avoid this, since there's no way to
be sure which function names will happen to end up in the histogram.
"Never ANALYZE the table" isn't going to fly as a workaround.

I am not real sure what we should do about it.  Clearly there is more
risk than I'd realized in datatypes whose input routines may reject
strings that their output routines had produced in good faith.

One possible route is to try to eliminate the ambiguity, but I doubt
that that will work very effectively for regproc and friends --- the
whole point of those types is to resolve ambiguous input, and so the
possibility of failures in the input routine can't easily be removed.
regproc in particular needs its special behavior to be useful for
bootstrapping.

Another approach is to try to fix pg_statistic to avoid the problem by
not doing I/O conversions.  For scalar datatypes (those that have
associated array types) it'd be probably be feasible to store the
histogram and most-common-value arrays as arrays of the datatype itself,
not arrays of text; that should be a win for performance as well as
avoiding risky conversions.  I am not sure what to do about columns that
have datatypes without matching array types, though (that would include
array columns and domains, IIRC).  Maybe use array of bytea to hold the
internal representation of the type?

Any comments or better ideas out there?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] regproc's lack of certainty is dangerous

Reply via email to