Re: Small fix on COPY ON_ERROR document

Yugo NAGATA Fri, 26 Jan 2024 01:10:09 -0800

On Thu, 25 Jan 2024 23:33:22 -0700
"David G. Johnston" <david.g.johns...@gmail.com> wrote:


> On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nag...@sraoss.co.jp> wrote:
> 
> > On Fri, 26 Jan 2024 13:59:09 +0900
> > Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nag...@sraoss.co.jp>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I found that the documentation of COPY ON_ERROR said
> > > > COPY stops operation at the first error when
> > > > "ON_ERROR is not specified.", but it also stop when
> > > > ON_ERROR is specified to the default value, "stop".
> > > >
> >
> >
> I'm finding the current wording surrounding ON_ERROR to not fit in with the
> rest of the page.  I've tried to improve things by being a bit more
> invasive.

Introducing the ON_ERROR option changed the behavior of COPY about error
handling to some extent, so it might be good to rewrite a bit more parts
of the documentation as you suggest.

> This first hunk updates the description of the COPY command to include
> describing the purpose of the ON_ERROR clause.
> 
> I too am concerned about the word "parsed" here, and "malformed" in the
> more detailed descriptions; this would need to be modified to better
> reflect the circumstances under which ignore happens.

I think "errors due to data type incompatibility" would be nice to describe
the errors to be ignored by ON_ERROR, as used in the NOTICE message output.
 
> diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
> index 21a5c4a052..5fe4c9f747 100644
> --- a/doc/src/sgml/ref/copy.sgml
> +++ b/doc/src/sgml/ref/copy.sgml
> @@ -90,6 +90,14 @@ COPY { <replaceable
> class="parameter">table_name</replaceable> [ ( <replaceable
>      in the <structname>pg_stat_progress_copy</structname> view. See
>      <xref linkend="copy-progress-reporting"/> for details.
>    </para>
> +
> +  <para>
> +    By default, <command>COPY</command> will abort if it encounters errors.
> +    For non-binary format <command>COPY FROM</command> the user can specify
> +    to instead ignore input rows that cannot be parsed by specifying
> +    the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
> +    clause.
> +  </para>
>   </refsect1>
> 

> 
> The following was, IMO, too much text about something not to worry about,
> COPY TO and ignore mode.  The point is dead tuples on error and running
> vacuum.  It doesn't really even matter what caused the error, though if the
> error was even before rows started to be imported then obviously there
> would be no dead tuples.
> 
> Oh, and the word "first" really isn't adding anything here that we cannot
> reasonably leave to common sense, IMO.  We either ignore all errors or stop
> on the first one.  There isn't a stop mode that is capable of bypassing
> errors and the ignore mode doesn't have a "first n" option so one assumes
> all errors are ignored.
> 
> @@ -576,15 +583,12 @@ COPY <replaceable
> class="parameter">count</replaceable>
>     </para>
> 
>     <para>
> -    <command>COPY</command> stops operation at the first error when
> -    <literal>ON_ERROR</literal> is not specified. This
> -    should not lead to problems in the event of a <command>COPY
> -    TO</command>, but the target table will already have received
> -    earlier rows in a <command>COPY FROM</command>. These rows will not
> -    be visible or accessible, but they still occupy disk space. This might
> -    amount to a considerable amount of wasted disk space if the failure
> -    happened well into a large copy operation. You might wish to invoke
> -    <command>VACUUM</command> to recover the wasted space.
> +    A failed <command>COPY FROM</command> command will have performed
> +    physical insertion of all rows prior to the malformed one.

As you said that it does not matter what caused the error, I don't think
we have to use "malformed" here. Instead, we could say "we will have
performed physical insertion of rows before the failure."


> +    While these rows will not be visible or accessible, they still occupy
> +    disk space. This might amount to a considerable amount of wasted disk
> +    space if the failure happened well into a large copy operation.
> +    <command>VACUUM</command> should be used to recover the wasted space.
>     </para>
> 
>     <para>
> 
> David J.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nag...@sraoss.co.jp>

Re: Small fix on COPY ON_ERROR document

Reply via email to