On Fri, Feb 02, 2024 at 05:46:18PM +0900, Sutou Kouhei wrote: > Hi, > > In <zbyidhirxrgzy...@paquier.xyz> > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Fri, 2 Feb 2024 17:04:28 +0900, > Michael Paquier <mich...@paquier.xyz> wrote: > > > One idea I was considering is whether we should use a special value in > > the "format" DefElem, say "custom:$my_custom_format" where it would be > > possible to bypass the formay check when processing options and find > > the routines after processing all the options. I'm not wedded to > > that, but attaching the routines to the state data is IMO the correct > > thing, because this has nothing to do with CopyFormatOptions. > > Thanks for sharing your idea. > Let's discuss how to support custom options after we > complete the current performance changes. > > >> I'm OK with the approach. But how about adding the extra > >> callbacks to Copy{From,To}StateData not > >> Copy{From,To}Routines like CopyToStateData::data_dest_cb and > >> CopyFromStateData::data_source_cb? They are only needed for > >> "text" and "csv". So we don't need to add them to > >> Copy{From,To}Routines to keep required callback minimum. > > > > And set them in cstate while we are in the Start routine, right? > > I imagined that it's done around the following part: > > @@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate, > /* Extract options from the statement node tree */ > ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , > options); > > + /* Set format routine */ > + cstate->routine = CopyFromGetRoutine(cstate->opts); > + > /* Process the target relation */ > cstate->rel = rel; > > > Example1: > > /* Set format routine */ > cstate->routine = CopyFromGetRoutine(cstate->opts); > if (!cstate->opts.binary) > if (cstate->opts.csv_mode) > cstate->copy_read_attributes = CopyReadAttributesCSV; > else > cstate->copy_read_attributes = CopyReadAttributesText; > > Example2: > > static void > CopyFromSetRoutine(CopyFromState cstate) > { > if (cstate->opts.csv_mode) > { > cstate->routine = &CopyFromRoutineCSV; > cstate->copy_read_attributes = CopyReadAttributesCSV; > } > else if (cstate.binary) > cstate->routine = &CopyFromRoutineBinary; > else > { > cstate->routine = &CopyFromRoutineText; > cstate->copy_read_attributes = CopyReadAttributesText; > } > } > > BeginCopyFrom() > { > /* Set format routine */ > CopyFromSetRoutine(cstate); > } > > > But I don't object your original approach. If we have the > extra callbacks in Copy{From,To}Routines, I just don't use > them for my custom format extension. > > >> What is the better next action for us? Do you want to > >> complete the WIP v11 patch set by yourself (and commit it)? > >> Or should I take over it? > > > > I was planning to work on that, but wanted to be sure how you felt > > about the problem with text and csv first. > > OK. > My opinion is the above. I have an idea how to implement it > but it's not a strong idea. You can choose whichever you like.
So, I've looked at all that today, and finished by applying two patches as of 2889fd23be56 and 95fb5b49024a to get some of the weirdness with the workhorse routines out of the way. Both have added callbacks assigned in their respective cstate data for text and csv. As this is called within the OneRow routine, I can live with that. If there is an opposition to that, we could just attach it within the routines. The CopyAttributeOut routines had a strange argument layout, actually, the flag for the quotes is required as a header uses no quotes, but there was little point in the "single arg" case, so I've removed it. I am attaching a v12 which is close to what I want it to be, with much more documentation and comments. There are two things that I've changed compared to the previous versions though: 1) I have added a callback to set up the input and output functions rather than attach that in the Start callback. These routines are now called once per argument, where we know that the argument is valid. The callbacks are in charge of filling the FmgrInfos. There are some good reasons behind that: - No need for plugins to think about how to allocate this data. v11 and other versions were doing things the wrong way by allocating this stuff in the wrong memory context as we switch to the COPY context when we are in the Start routines. - This avoids attisdropped problems, and we have a long history of bugs regarding that. I'm ready to bet that custom formats would get that wrong. 2) I have backpedaled on the postpare callback, which did not bring much in clarity IMO while being a CSV-only callback. Note that we have in copyfromparse.c more paths that are only for CSV but the past versions of the patch never cared about that. This makes the text and CSV implementations much closer to each other, as a result. I had mixed feelings about CopySendEndOfRow() being split to CopyToTextSendEndOfRow() to send the line terminations when sending a CSV/text row, but I'm OK with that at the end. v12 is mostly about moving code around at this point, making it kind of straight-forward to follow as the code blocks are the same. I'm still planning to do a few more measurements, just lacked of time. Let me know if you have comments about all that. -- Michael
From 964ae392815abbab70f501c6cf430134aa2c5e08 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Mon, 5 Feb 2024 15:38:12 +0900 Subject: [PATCH v12] Extract COPY FROM/TO format implementations This doesn't change the current behavior. This just introduces a set of copy routines called CopyFromRoutine and CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. There are plans to extend that more in the future with custom formats. This improves performance when a relation has many attributes as this eliminates format-based if branches. The more the attributes, the better the performance gain. Blah add some numbers. --- src/include/commands/copyapi.h | 99 +++++ src/include/commands/copyfrom_internal.h | 10 + src/backend/commands/copyfrom.c | 207 ++++++++--- src/backend/commands/copyfromparse.c | 362 ++++++++++--------- src/backend/commands/copyto.c | 438 +++++++++++++++-------- src/tools/pgindent/typedefs.list | 2 + 6 files changed, 767 insertions(+), 351 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..06b7bba8f3 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,99 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated to the relation's attributes writing to. `fmgr_info` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optinally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo, + Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated to the relation's attributes reading from. `fmgr_info` can + * be optionally filled. `atttypid` is the OID of data type used by the + * relation's attribute. + */ + void (*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 759f8e3d09..f616d58b1f 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -65,6 +66,9 @@ typedef int (*CopyReadAttributes) (CopyFromState cstate); */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ @@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate); extern int CopyReadAttributesCSV(CopyFromState cstate); extern int CopyReadAttributesText(CopyFromState cstate); +/* Callbacks for CopyFromRoutine->OneRow */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 41f6bc43e4..de725b3544 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -108,6 +108,158 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations for text and CSV. + */ + +/* + * CopyFromTextInFunc + * + * Assign input function data for a relation's attribute in text/CSV format. + */ +static void +CopyFromTextInFunc(Oid atttypid, FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextStart + * + * Start of COPY FROM for text/CSV format. + */ +static void +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* create workspace for CopyReadAttributes results */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); + + /* Set read attribute callback */ + if (cstate->opts.csv_mode) + cstate->copy_read_attributes = CopyReadAttributesCSV; + else + cstate->copy_read_attributes = CopyReadAttributesText; +} + +/* + * CopyFromTextEnd + * + * End of COPY FROM for text/CSV format. + */ +static void +CopyFromTextEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * CopyFromBinaryInFunc + * + * Assign input function data for a relation's attribute in binary format. + */ +static void +CopyFromBinaryInFunc(Oid atttypid, FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextStart + * + * Start of COPY FROM for binary format. + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * CopyFromTextEnd + * + * End of COPY FROM for binary format. + */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * Routines assigned to each format. ++ + * CSV and text share the same implementation, at the exception of the + * copy_read_attributes callback. + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextInFunc, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextInFunc, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Define the COPY FROM routines to use for a format. + */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + + /* * error context callback for COPY FROM * @@ -1386,7 +1538,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1418,6 +1569,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1571,25 +1725,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1622,13 +1757,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + cstate->routine->CopyFromInFunc(att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1763,25 +1894,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - - if (cstate->opts.csv_mode) - cstate->copy_read_attributes = CopyReadAttributesCSV; - else - cstate->copy_read_attributes = CopyReadAttributesText; - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1794,6 +1907,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 906756362e..c45f9ae134 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -832,6 +832,200 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * CopyFromTextOneRow + * + * Copy one row to a set of `values` and `nulls` for the text and CSV + * formats. + */ +bool +CopyFromTextOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (cstate->opts.csv_mode) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + cstate->num_errors++; + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +/* + * CopyFromBinaryOneRow + * + * Copy one row to a set of `values` and `nulls` for the binary format. + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -841,7 +1035,8 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) * read from the file, and DEFAULT option is unset. * * 'values' and 'nulls' arrays must be the same length as columns of the - * relation passed to BeginCopyFrom. This function fills the arrays. + * relation passed to BeginCopyFrom. This function calls the format routine + * that should fill the arrays. */ bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, @@ -849,181 +1044,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - cstate->num_errors++; - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, + nulls)) + return false; /* * Now compute and insert any defaults available for the columns not diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 52b42f8a52..fdf5dd1aa8 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -79,6 +80,9 @@ typedef void (*CopyAttributeOut) (CopyToState cstate, const char *string, */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -143,6 +147,291 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToTextSendEndOfRow + * + * Apply line terminations for a line sent in text or CSV format depending + * on the destination, then send the end of a row. + */ +static inline void +CopyToTextSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextStart + * + * Start of COPY TO for text and CSV format. + */ +static void +CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + ListCell *cur; + + /* Set output representation callback */ + if (cstate->opts.csv_mode) + cstate->copy_attribute_out = CopyAttributeOutCSV; + else + cstate->copy_attribute_out = CopyAttributeOutText; + + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + /* Ignore quotes */ + cstate->copy_attribute_out(cstate, colname, false); + } + + CopyToTextSendEndOfRow(cstate); + } +} + +/* + * CopyToTextOutFunc + * + * Assign output function data for a relation's attribute in text/CSV format. + */ +static void +CopyToTextOutFunc(Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToTextOneRow + * + * Process one row for text/CSV format. + */ +static void +CopyToTextOneRow(CopyToState cstate, + TupleTableSlot *slot) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + + cstate->copy_attribute_out(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + } + } + + CopyToTextSendEndOfRow(cstate); +} + +/* + * CopyToTextEnd + * + * End of COPY TO for text/CSV format. + */ +static void +CopyToTextEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * CopyToBinaryStart + * + * Start of COPY TO for binary format. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * CopyToBinaryOutFunc + * + * Assign output function data for a relation's attribute in binary format. + */ +static void +CopyToBinaryOutFunc(Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToBinaryOneRow + * + * Process one row for binary format. + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* + * CopyToBinaryEnd + * + * End of COPY TO for binary format. + */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CSV and text share the same implementation, at the exception of the + * output representation function. + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextStart, + .CopyToOutFunc = CopyToTextOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextStart, + .CopyToOutFunc = CopyToTextOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Define the COPY TO routines to use for a format. This should be called + * after options are parsed. + */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -210,16 +499,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -254,10 +533,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -445,14 +720,8 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); - /* Set output representation callback */ - if (!cstate->opts.binary) - { - if (cstate->opts.csv_mode) - cstate->copy_attribute_out = CopyAttributeOutCSV; - else - cstate->copy_attribute_out = CopyAttributeOutText; - } + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); /* Process the source/target relation or query */ if (rel) @@ -791,19 +1060,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + cstate->routine->CopyToOutFunc(attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -816,54 +1076,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - /* Ignore quotes */ - cstate->copy_attribute_out(cstate, colname, false); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -902,13 +1115,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -924,68 +1131,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - - cstate->copy_attribute_out(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 91433d439b..d02a7773e3 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -473,6 +473,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice @@ -482,6 +483,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.43.0
signature.asc
Description: PGP signature