Hi,

In <zdbtqj-p5h1_e...@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 22 Feb 2024 15:44:16 +0900,
  Michael Paquier <mich...@paquier.xyz> wrote:

> I was comparing what you have here, and what's been attached by Andres
> at [1] and the top of the changes on my development branch at [2]
> (v3-0008, mostly).  And, it strikes me that there is no need to do any
> major changes in any of the callbacks proposed up to v13 and v14 in
> this thread, as all the changes proposed want to plug in more data
> into each StateData for COPY FROM and COPY TO, the best part being
> that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
> from Sutou-san would need one slight tweak in the per-row callback,
> still that's minor.

I think so too. But I thought that some minor conflicts will
be happen with this and the v15. So I worked on this before
the v15.

We agreed that this optimization doesn't block v15: [1]
So we can work on the v15 without this optimization for now.

[1] 
https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15

> I have been spending more time on the patch to introduce the COPY
> APIs, leading me to the v15 attached, where I have replaced the
> previous attribute callbacks for the output representation and the
> reads with hardcoded routines that should be optimized by compilers,
> and I have done more profiling with -O2.

Thanks! I wanted to work on it but I didn't have enough time
for it in a few days...

I've reviewed the v15.

----
> @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int 
> nbytes)
>   *
>   * NOTE: force_not_null option are not applied to the returned fields.
>   */
> -bool
> -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> +static bool

"inline" is missing here.

> +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
> +                                       bool is_csv)
>  {
>       int                     fldct;
----

How about adding "is_csv" to CopyReadline() and
CopyReadLineText() too?

----
diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 25b8d4bc52..79fabecc69 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -150,8 +150,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static inline int CopyReadAttributesText(CopyFromState cstate);
 static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
                tupDesc = RelationGetDescr(cstate->rel);
 
                cstate->cur_lineno++;
-               done = CopyReadLine(cstate);
+               done = CopyReadLine(cstate, is_csv);
 
                if (cstate->opts.header_line == COPY_HEADER_MATCH)
                {
@@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
        cstate->cur_lineno++;
 
        /* Actually read the line into memory here */
-       done = CopyReadLine(cstate);
+       done = CopyReadLine(cstate, is_csv);
 
        /*
         * EOF at start of line means we're done.  If we see EOF after some
@@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
        bool            result;
 
@@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate)
        cstate->line_buf_valid = false;
 
        /* Parse data and transfer into line_buf */
-       result = CopyReadLineText(cstate);
+       result = CopyReadLineText(cstate, is_csv);
 
        if (result)
        {
@@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
        char       *copy_input_buf;
        int                     input_buf_ptr;
@@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate)
        char            quotec = '\0';
        char            escapec = '\0';
 
-       if (cstate->opts.csv_mode)
+       if (is_csv)
        {
                quotec = cstate->opts.quote[0];
                escapec = cstate->opts.escape[0];
@@ -1306,7 +1306,7 @@ CopyReadLineText(CopyFromState cstate)
                prev_raw_ptr = input_buf_ptr;
                c = copy_input_buf[input_buf_ptr++];
 
-               if (cstate->opts.csv_mode)
+               if (is_csv)
                {
                        /*
                         * If character is '\\' or '\r', we may need to look 
ahead below.
@@ -1345,7 +1345,7 @@ CopyReadLineText(CopyFromState cstate)
                }
 
                /* Process \r */
-               if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+               if (c == '\r' && (!is_csv || !in_quote))
                {
                        /* Check for \r\n on first line, _and_ handle \r\n. */
                        if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1373,10 +1373,10 @@ CopyReadLineText(CopyFromState cstate)
                                        if (cstate->eol_type == EOL_CRNL)
                                                ereport(ERROR,
                                                                
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                                                
!cstate->opts.csv_mode ?
+                                                                !is_csv ?
                                                                 
errmsg("literal carriage return found in data") :
                                                                 
errmsg("unquoted carriage return found in data"),
-                                                                
!cstate->opts.csv_mode ?
+                                                                !is_csv ?
                                                                 errhint("Use 
\"\\r\" to represent carriage return.") :
                                                                 errhint("Use 
quoted CSV field to represent carriage return.")));
 
@@ -1390,10 +1390,10 @@ CopyReadLineText(CopyFromState cstate)
                        else if (cstate->eol_type == EOL_NL)
                                ereport(ERROR,
                                                
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                                !cstate->opts.csv_mode ?
+                                                !is_csv ?
                                                 errmsg("literal carriage 
return found in data") :
                                                 errmsg("unquoted carriage 
return found in data"),
-                                                !cstate->opts.csv_mode ?
+                                                !is_csv ?
                                                 errhint("Use \"\\r\" to 
represent carriage return.") :
                                                 errhint("Use quoted CSV field 
to represent carriage return.")));
                        /* If reach here, we have found the line terminator */
@@ -1401,15 +1401,15 @@ CopyReadLineText(CopyFromState cstate)
                }
 
                /* Process \n */
-               if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+               if (c == '\n' && (!is_csv || !in_quote))
                {
                        if (cstate->eol_type == EOL_CR || cstate->eol_type == 
EOL_CRNL)
                                ereport(ERROR,
                                                
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                                !cstate->opts.csv_mode ?
+                                                !is_csv ?
                                                 errmsg("literal newline found 
in data") :
                                                 errmsg("unquoted newline found 
in data"),
-                                                !cstate->opts.csv_mode ?
+                                                !is_csv ?
                                                 errhint("Use \"\\n\" to 
represent newline.") :
                                                 errhint("Use quoted CSV field 
to represent newline.")));
                        cstate->eol_type = EOL_NL;      /* in case not set yet 
*/
@@ -1421,7 +1421,7 @@ CopyReadLineText(CopyFromState cstate)
                 * In CSV mode, we only recognize \. alone on a line.  This is 
because
                 * \. is a valid CSV data value.
                 */
-               if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+               if (c == '\\' && (!is_csv || first_char_in_line))
                {
                        char            c2;
 
@@ -1454,7 +1454,7 @@ CopyReadLineText(CopyFromState cstate)
 
                                        if (c2 == '\n')
                                        {
-                                               if (!cstate->opts.csv_mode)
+                                               if (!is_csv)
                                                        ereport(ERROR,
                                                                        
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                                                         
errmsg("end-of-copy marker does not match previous newline style")));
@@ -1463,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
                                        }
                                        else if (c2 != '\r')
                                        {
-                                               if (!cstate->opts.csv_mode)
+                                               if (!is_csv)
                                                        ereport(ERROR,
                                                                        
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                                                         
errmsg("end-of-copy marker corrupt")));
@@ -1479,7 +1479,7 @@ CopyReadLineText(CopyFromState cstate)
 
                                if (c2 != '\r' && c2 != '\n')
                                {
-                                       if (!cstate->opts.csv_mode)
+                                       if (!is_csv)
                                                ereport(ERROR,
                                                                
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                                                 
errmsg("end-of-copy marker corrupt")));
@@ -1508,7 +1508,7 @@ CopyReadLineText(CopyFromState cstate)
                                result = true;  /* report EOF */
                                break;
                        }
-                       else if (!cstate->opts.csv_mode)
+                       else if (!is_csv)
                        {
                                /*
                                 * If we are here, it means we found a 
backslash followed by
----

> In this case, I have been able to limit the effects of the per-row
> callback by making NextCopyFromRawFields() local to copyfromparse.c
> while applying some inlining to it.  This brings me to a different
> point, why don't we do this change independently on HEAD?

Does this mean that changing

bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)

to (adding "static")

static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int 
*nfields)

not  (adding "static" and "bool is_csv")

static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int 
*nfields, bool is_csv)

improves performance?

If so, adding the change independently on HEAD makes
sense. But I don't know why that improves
performance... Inlining?

>                                                            It's not 
> really complicated to make NextCopyFromRawFields show high in the
> profiles.  I was looking at external projects, and noticed that
> there's nothing calling NextCopyFromRawFields() directly.

It means that we can hide NextCopyFromRawFields() without
breaking compatibility (because nobody uses it), right?

If so, I also think that we can change
NextCopyFromRawFields() directly.

If we assume that someone (not public code) may use it, we
can create a new internal function and use it something
like:

----
diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 7cacd0b752..b1515ead82 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -751,8 +751,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int 
nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int 
*nfields)
 {
        int                     fldct;
        bool            done;
@@ -840,6 +840,12 @@ NextCopyFromRawFields(CopyFromState cstate, char 
***fields, int *nfields)
        return true;
 }
 
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+       return NextCopyFromRawFieldsInternal(cstate, fields, nfields);
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
----


Thanks,
-- 
kou


Reply via email to