On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote:
> As I mentioned in a separate email, if we use the OS default EOL as
> the default EOL in raw format, it would not be necessary to allow it
> to be multi characters. I think it's worth considering it.

I like the idea, but not sure I understand how it would work.

What if a user's OS default is \n (LF) and this user wants
to import a Windows text file \r\n (CR LR), which is a
multi characters EOL delimiter.

Was your idea to make an exception for that particular EOL,
or to simply not support that edge case?

> ---
>  * copyfromparse.c
>  *      Parse CSV/text/binary format for COPY FROM.
>
> We need to update here as well.
>
> --
> - * 4. CopyReadAttributesText/CSV() function takes the input line from
> + * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
>
> I think we don't have CopyReadAttributesRaw() function now.
>
> ---
> I think it would be better to explain how to parse data in raw mode,
> especially which steps in the pipeline we skip, in the comment at the
> top of copyfromparse.c.
>
> ---
> -       if (cstate->opts.format == COPY_FORMAT_CSV)
> +       if (cstate->opts.format == COPY_FORMAT_RAW)
>         {
> -           /*
> -            * If character is '\r', we may need to look ahead below.  Force
> -            * fetch of the next character if we don't already have it.  We
> -            * need to do this before changing CSV state, in case '\r' is also
> -            * the quote or escape character.
> -            */
> -           if (c == '\r')
> +           if (delim_len == 0)
>             {
> -               IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
> +               /* When reading entire file, consume all remaining
> bytes at once */
> +               input_buf_ptr = copy_buf_len;
> +               continue;
>             }
> +           else
>
> I think that the code become much simpler if we do 'continue' at the
> end of  'if (cstate->opts.format == COPY_FORMAT_RAW)' branch, instead
> of doing 'if (cstate->opts.format == COPY_FORMAT_RAW) {} else ...'.
>
> ---
> @@ -1461,6 +1536,7 @@ CopyReadLineText(CopyFromState cstate)
>     return result;
>  }
>
> +
>  /*
>   * Return decimal value for a hexadecimal digit
>   */
> @@ -1937,7 +2013,6 @@ endfield:
>     return fieldno;
>  }
>
> -
>  /*
>   * Read a binary attribute
>   */
>
> Unnecessary line addition and removal.

Thanks for review, I agree on all comments,
awaiting your comment on EOL before implementing changes.

/Joel


Reply via email to