On Thu, 25 Jan 2024 23:33:22 -0700 "David G. Johnston" <david.g.johns...@gmail.com> wrote:
> On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nag...@sraoss.co.jp> wrote: > > > On Fri, 26 Jan 2024 13:59:09 +0900 > > Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nag...@sraoss.co.jp> > > wrote: > > > > > > > > Hi, > > > > > > > > I found that the documentation of COPY ON_ERROR said > > > > COPY stops operation at the first error when > > > > "ON_ERROR is not specified.", but it also stop when > > > > ON_ERROR is specified to the default value, "stop". > > > > > > > > > I'm finding the current wording surrounding ON_ERROR to not fit in with the > rest of the page. I've tried to improve things by being a bit more > invasive. Introducing the ON_ERROR option changed the behavior of COPY about error handling to some extent, so it might be good to rewrite a bit more parts of the documentation as you suggest. > This first hunk updates the description of the COPY command to include > describing the purpose of the ON_ERROR clause. > > I too am concerned about the word "parsed" here, and "malformed" in the > more detailed descriptions; this would need to be modified to better > reflect the circumstances under which ignore happens. I think "errors due to data type incompatibility" would be nice to describe the errors to be ignored by ON_ERROR, as used in the NOTICE message output. > diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml > index 21a5c4a052..5fe4c9f747 100644 > --- a/doc/src/sgml/ref/copy.sgml > +++ b/doc/src/sgml/ref/copy.sgml > @@ -90,6 +90,14 @@ COPY { <replaceable > class="parameter">table_name</replaceable> [ ( <replaceable > in the <structname>pg_stat_progress_copy</structname> view. See > <xref linkend="copy-progress-reporting"/> for details. > </para> > + > + <para> > + By default, <command>COPY</command> will abort if it encounters errors. > + For non-binary format <command>COPY FROM</command> the user can specify > + to instead ignore input rows that cannot be parsed by specifying > + the <literal>ignore</literal> option to the <literal>ON_ERROR</literal> > + clause. > + </para> > </refsect1> > > > The following was, IMO, too much text about something not to worry about, > COPY TO and ignore mode. The point is dead tuples on error and running > vacuum. It doesn't really even matter what caused the error, though if the > error was even before rows started to be imported then obviously there > would be no dead tuples. > > Oh, and the word "first" really isn't adding anything here that we cannot > reasonably leave to common sense, IMO. We either ignore all errors or stop > on the first one. There isn't a stop mode that is capable of bypassing > errors and the ignore mode doesn't have a "first n" option so one assumes > all errors are ignored. > > @@ -576,15 +583,12 @@ COPY <replaceable > class="parameter">count</replaceable> > </para> > > <para> > - <command>COPY</command> stops operation at the first error when > - <literal>ON_ERROR</literal> is not specified. This > - should not lead to problems in the event of a <command>COPY > - TO</command>, but the target table will already have received > - earlier rows in a <command>COPY FROM</command>. These rows will not > - be visible or accessible, but they still occupy disk space. This might > - amount to a considerable amount of wasted disk space if the failure > - happened well into a large copy operation. You might wish to invoke > - <command>VACUUM</command> to recover the wasted space. > + A failed <command>COPY FROM</command> command will have performed > + physical insertion of all rows prior to the malformed one. As you said that it does not matter what caused the error, I don't think we have to use "malformed" here. Instead, we could say "we will have performed physical insertion of rows before the failure." > + While these rows will not be visible or accessible, they still occupy > + disk space. This might amount to a considerable amount of wasted disk > + space if the failure happened well into a large copy operation. > + <command>VACUUM</command> should be used to recover the wasted space. > </para> > > <para> > > David J. Regards, Yugo Nagata -- Yugo NAGATA <nag...@sraoss.co.jp>