On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<[email protected]> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <[email protected]> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=byrddy8mzcxafg4h9xtetbdm-wvjao1t4ucsec...@mail.gmail.com>
>> "Re: Make COPY format extendable: Extract COPY TO format implementations"
>> on Mon, 17 Mar 2025 13:50:03 -0700,
>> Masahiko Sawada <[email protected]> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't
> declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error
> message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog,
> schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't
> scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as
> '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR: function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler
> return type. Unfortunately, LookupFuncName seems incapable of doing what we
> want here. I suggest we create a new lookup routine where we can specify the
> return argument type as a required element. That would cleanly mitigate the
> denial-of-service attack/accident vector demonstrated above (the text
> returning function should have zero impact on how this feature behaves). If
> someone does create a handler SQL function without using copy_handler return
> type we'd end up showing "COPY format 'david' not recognized" - a developer
> should be able to figure out they didn't put a correct return type on their
> handler function and that is why the system did not register it.
Just to be clear, the patch checks the function's return type before calling it:
funcargtypes[0] = INTERNALOID;
handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
funcargtypes, true);
if (!OidIsValid(handlerOid))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY format \"%s\" not
recognized", format),
parser_errposition(pstate, defel->location)));
/* check that handler has correct return type */
if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("function %s must return type %s",
format, "copy_handler"),
parser_errposition(pstate, defel->location)));
So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?
FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:
=# select * from pg_class tablesample text (0);
ERROR: function text must return type tsm_handler
A difference between TABLESAMPLE and COPY format is that the former
accepts a qualified name but the latter doesn't:
=# create extension tsm_system_rows ;
=# create schema s1;
=# create function s1.system_rows(internal) returns void language c as
'tsm_system_rows.so', 'tsm_system_rows_handler';
=# \df *.system_rows
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+-------------+------------------+---------------------+------
public | system_rows | tsm_handler | internal | func
s1 | system_rows | void | internal | func
(2 rows)
postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0);
count
-------
0
(1 row)
postgres(1:1194923)=# select count(*) from pg_class tablesample
s1.system_rows(0);
ERROR: function s1.system_rows must return type tsm_handler
> A second concern is simply people wanting to name things the same; or, why
> namespaces were invented.
Yeah, I think that the custom COPY format should support qualified
names at least.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com