On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei <k...@clear-code.com> wrote: > > Hi, > > In <cad21aocunywhird3gapzwe6s9jg1wzxj3cr6vgn36ddhegj...@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Mon, 11 Dec 2023 23:31:29 +0900, > Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > I've sketched the above idea including a test module in > > src/test/module/test_copy_format, based on v2 patch. It's not splitted > > and is dirty so just for discussion. > > I implemented a sample COPY TO handler for Apache Arrow that > supports only integer and text. > > I needed to extend the patch: > > 1. Add an opaque space for custom COPY TO handler > * Add CopyToState{Get,Set}Opaque() > > https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944 > > 2. Export CopyToState::attnumlist > * Add CopyToStateGetAttNumList() > > https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688 > > 3. Export CopySend*() > * Rename CopySend*() to CopyToStateSend*() and export them > * Exception: CopySendEndOfRow() to CopyToStateFlush() because > it just flushes the internal buffer now. > > https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e > I guess the purpose of these helpers is to avoid expose CopyToState to copy.h, but I think expose CopyToState to user might make life easier, users might want to use the memory contexts of the structure (though I agree not all the fields are necessary for extension handers).
> The attached patch is based on the Sawada-san's patch and > includes the above changes. Note that this patch is also > dirty so just for discussion. > > My suggestions from this experience: > > 1. Split COPY handler to COPY TO handler and COPY FROM handler > > * CopyFormatRoutine is a bit tricky. An extension needs > to create a CopyFormatRoutine node and > a CopyToFormatRoutine node. > > * If we just require "copy_to_${FORMAT}(internal)" > function and "copy_from_${FORMAT}(internal)" function, > we can remove the tricky approach. And it also avoid > name collisions with other handler such as tablesample > handler. > See also: > > https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9 > > 2. Need an opaque space like IndexScanDesc::opaque does > > * A custom COPY TO handler needs to keep its data I once thought users might want to parse their own options, maybe this is a use case for this opaque space. For the name, I thought private_data might be a better candidate than opaque, but I do not insist. > > 3. Export CopySend*() > > * If we like minimum API, we just need to export > CopySendData() and CopySendEndOfRow(). But > CopySend{String,Char,Int32,Int16}() will be convenient > custom COPY TO handlers. (A custom COPY TO handler for > Apache Arrow doesn't need them.) Do you use the arrow library to control the memory? Is there a way that we can let the arrow use postgres' memory context? I'm not sure this is necessary, just raise the question for discussion. > > Questions: > > 1. What value should be used for "format" in > PgMsg_CopyOutResponse message? > > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144 > > It's 1 for binary format and 0 for text/csv format. > > Should we make it customizable by custom COPY TO handler? > If so, what value should be used for this? > > 2. Do we need more tries for design discussion for the first > implementation? If we need, what should we try? > > > Thanks, > -- > kou +PG_FUNCTION_INFO_V1(copy_testfmt_handler); +Datum +copy_testfmt_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);; + extra semicolon. -- Regards Junwang Zhao