Re: Small fix on COPY ON_ERROR document
On Sat, 3 Feb 2024 01:51:56 +0200 Alexander Korotkov wrote: > On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston > wrote: > > On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA wrote: > >> > >> > >> I attached a updated patch including fixes you pointed out above. > >> > > > > Removed "which"; changed "occupying" to "occupy" > > Removed on of the two "amounts" > > Changed "unacceptable to the input function" to just "converting" as that > > is what the average reader is more likely to be thinking. > > The rest was good. > > Thanks to everybody in the thread. > Pushed. Thanks! > > -- > Regards, > Alexander Korotkov -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston wrote: > On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA wrote: >> >> >> I attached a updated patch including fixes you pointed out above. >> > > Removed "which"; changed "occupying" to "occupy" > Removed on of the two "amounts" > Changed "unacceptable to the input function" to just "converting" as that is > what the average reader is more likely to be thinking. > The rest was good. Thanks to everybody in the thread. Pushed. -- Regards, Alexander Korotkov
Re: Small fix on COPY ON_ERROR document
On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA wrote: > > I attached a updated patch including fixes you pointed out above. > > Removed "which"; changed "occupying" to "occupy" Removed on of the two "amounts" Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely to be thinking. The rest was good. diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 3c2feaa11a..55764fc1f2 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -385,8 +385,8 @@ COPY { table_name [ ( ON_ERROR - Specifies which how to behave when encountering an error due to column values - unacceptable to the input function of each attribute's data type. + Specifies how to behave when encountering an error converting a column's + input value into its data type. An error_action value of stop means fail the command, while ignore means discard the input row and continue with the next one. @@ -589,7 +589,7 @@ COPY count The COPY FROM command physically inserts input rows into the table as it progresses. If the command fails, these rows are left in a deleted state; these rows will not be visible, but still -occupying disk space. This might amount to a considerable amount of +occupy disk space. This might amount to considerable wasted disk space if the failure happened well into a large copy operation. VACUUM should be used to recover the wasted space. David J.
Re: Small fix on COPY ON_ERROR document
On Fri, 02 Feb 2024 11:29:41 +0900 torikoshia wrote: > On 2024-02-01 15:16, Yugo NAGATA wrote: > > On Mon, 29 Jan 2024 15:47:25 +0900 > > Yugo NAGATA wrote: > > > >> On Sun, 28 Jan 2024 19:14:58 -0700 > >> "David G. Johnston" wrote: > >> > >> > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > >> > > COPY FROM raises an error when the number of input column does not > >> > > match > >> > > to the table schema, but this error is not ignored by ON_ERROR while > >> > > this seems to fall into the category of "invalid input syntax". > >> > > >> > > >> > > >> > It is literally the error text that appears if one were not to ignore it. > >> > It isn’t a category of errors. But I’m open to ideas here. But being > >> > explicit with what on actually sees in the system seemed preferable to > >> > inventing new classification terms not otherwise used. > >> > >> Thank you for explanation! I understood the words was from the error > >> messages > >> that users actually see. However, as Torikoshi-san said in [1], errors > >> other > >> than valid input syntax (e.g. range error) can be also ignored, > >> therefore it > >> would be better to describe to be ignored errors more specifically. > >> > >> [1] > >> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > >> > >> > > >> > > > >> > > So, keeping consistency with the existing description, we can say: > >> > > > >> > > "Specifies which how to behave when encountering an error due to > >> > > column values unacceptable to the input function of each attribute's > >> > > data type." > >> > > >> > > >> > Yeah, I was considering something along those lines as an option as well. > >> > But I’d rather add that wording to the glossary. > >> > >> Although I am still be not convinced if we have to introduce the words > >> "soft error" to the documentation, I don't care it if there are no > >> other > >> opposite opinions. > > > > Attached is a updated patch v3, which is a version that uses the above > > wording instead of "soft error". > > > >> > > >> > > Currently, ON_ERROR doesn't support other soft errors, so it can > >> > > explain > >> > > it more simply without introducing the new concept, "soft error" to > >> > > users. > >> > > > >> > > > >> > Good point. Seems we should define what user-facing errors are ignored > >> > anywhere in the system and if we aren’t consistently leveraging these in > >> > all areas/commands make the necessary qualifications in those specific > >> > places. > >> > > >> > >> > > I think "left in a deleted state" is also unclear for users because > >> > > this > >> > > explains the internal state but not how looks from user's view.How > >> > > about > >> > > leaving the explanation "These rows will not be visible or accessible" > >> > > in > >> > > the existing statement? > >> > > > >> > > >> > Just visible then, I don’t like an “or” there and as tuples at least they > >> > are accessible to the system, in vacuum especially. But I expected the > >> > user to understand “as if you deleted it” as their operational concept > >> > more > >> > readily than visible. I think this will be read by people who haven’t > >> > read > >> > MVCC to fully understand what visible means but know enough to run vacuum > >> > to clean up updated and deleted data as a rule. > >> > >> Ok, I agree we can omit "or accessible". How do you like the > >> followings? > >> Still redundant? > >> > >> "If the command fails, these rows are left in a deleted state; > >> these rows will not be visible, but they still occupy disk space. " > > > > Also, the above statement is used in the patch. > > Thanks for updating the patch! > > I like your description which doesn't use the word soft error. Thank you for your comments! > > Here are minor comments: > > > + ignore means discard the input row and > > continue with the next one. > > + The default is stop > > Is "." required at the end of the line? > > > An NOTICE level context message containing the > > ignored row count is > > Should 'An' be 'A'? > > Also, I wasn't sure the necessity of 'context'. > It might be possible to just say "A NOTICE message containing the > ignored row count.." > considering below existing descriptions: > >doc/src/sgml/pltcl.sgml: a NOTICE message each > time a supported command is >doc/src/sgml/pltcl.sgml- executed: > >doc/src/sgml/plpgsql.sgml: This example trigger simply raises a > NOTICE message >doc/src/sgml/plpgsql.sgml- each time a supported command is > executed. I attached a updated patch including fixes you pointed out above. Regards, Yugo Nagata > -- > Regards, > > -- > Atsushi Torikoshi > NTT DATA Group Corporation -- Yugo NAGATA diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..3c2feaa11a 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -90,6 +90,13 @@ COPY {
Re: Small fix on COPY ON_ERROR document
On 2024-02-01 15:16, Yugo NAGATA wrote: On Mon, 29 Jan 2024 15:47:25 +0900 Yugo NAGATA wrote: On Sun, 28 Jan 2024 19:14:58 -0700 "David G. Johnston" wrote: > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > > COPY FROM raises an error when the number of input column does not match > > to the table schema, but this error is not ignored by ON_ERROR while > > this seems to fall into the category of "invalid input syntax". > > > > It is literally the error text that appears if one were not to ignore it. > It isn’t a category of errors. But I’m open to ideas here. But being > explicit with what on actually sees in the system seemed preferable to > inventing new classification terms not otherwise used. Thank you for explanation! I understood the words was from the error messages that users actually see. However, as Torikoshi-san said in [1], errors other than valid input syntax (e.g. range error) can be also ignored, therefore it would be better to describe to be ignored errors more specifically. [1] https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > > > > > So, keeping consistency with the existing description, we can say: > > > > "Specifies which how to behave when encountering an error due to > > column values unacceptable to the input function of each attribute's > > data type." > > > Yeah, I was considering something along those lines as an option as well. > But I’d rather add that wording to the glossary. Although I am still be not convinced if we have to introduce the words "soft error" to the documentation, I don't care it if there are no other opposite opinions. Attached is a updated patch v3, which is a version that uses the above wording instead of "soft error". > > > Currently, ON_ERROR doesn't support other soft errors, so it can explain > > it more simply without introducing the new concept, "soft error" to users. > > > > > Good point. Seems we should define what user-facing errors are ignored > anywhere in the system and if we aren’t consistently leveraging these in > all areas/commands make the necessary qualifications in those specific > places. > > > I think "left in a deleted state" is also unclear for users because this > > explains the internal state but not how looks from user's view.How about > > leaving the explanation "These rows will not be visible or accessible" in > > the existing statement? > > > > Just visible then, I don’t like an “or” there and as tuples at least they > are accessible to the system, in vacuum especially. But I expected the > user to understand “as if you deleted it” as their operational concept more > readily than visible. I think this will be read by people who haven’t read > MVCC to fully understand what visible means but know enough to run vacuum > to clean up updated and deleted data as a rule. Ok, I agree we can omit "or accessible". How do you like the followings? Still redundant? "If the command fails, these rows are left in a deleted state; these rows will not be visible, but they still occupy disk space. " Also, the above statement is used in the patch. Thanks for updating the patch! I like your description which doesn't use the word soft error. Here are minor comments: + ignore means discard the input row and continue with the next one. + The default is stop Is "." required at the end of the line? An NOTICE level context message containing the ignored row count is Should 'An' be 'A'? Also, I wasn't sure the necessity of 'context'. It might be possible to just say "A NOTICE message containing the ignored row count.." considering below existing descriptions: doc/src/sgml/pltcl.sgml: a NOTICE message each time a supported command is doc/src/sgml/pltcl.sgml- executed: doc/src/sgml/plpgsql.sgml: This example trigger simply raises a NOTICE message doc/src/sgml/plpgsql.sgml- each time a supported command is executed. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Small fix on COPY ON_ERROR document
On Mon, 29 Jan 2024 15:47:25 +0900 Yugo NAGATA wrote: > On Sun, 28 Jan 2024 19:14:58 -0700 > "David G. Johnston" wrote: > > > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > > > COPY FROM raises an error when the number of input column does not match > > > to the table schema, but this error is not ignored by ON_ERROR while > > > this seems to fall into the category of "invalid input syntax". > > > > > > > > It is literally the error text that appears if one were not to ignore it. > > It isn’t a category of errors. But I’m open to ideas here. But being > > explicit with what on actually sees in the system seemed preferable to > > inventing new classification terms not otherwise used. > > Thank you for explanation! I understood the words was from the error messages > that users actually see. However, as Torikoshi-san said in [1], errors other > than valid input syntax (e.g. range error) can be also ignored, therefore it > would be better to describe to be ignored errors more specifically. > > [1] > https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > > > > > > > > > So, keeping consistency with the existing description, we can say: > > > > > > "Specifies which how to behave when encountering an error due to > > > column values unacceptable to the input function of each attribute's > > > data type." > > > > > > Yeah, I was considering something along those lines as an option as well. > > But I’d rather add that wording to the glossary. > > Although I am still be not convinced if we have to introduce the words > "soft error" to the documentation, I don't care it if there are no other > opposite opinions. Attached is a updated patch v3, which is a version that uses the above wording instead of "soft error". > > > > > Currently, ON_ERROR doesn't support other soft errors, so it can explain > > > it more simply without introducing the new concept, "soft error" to users. > > > > > > > > Good point. Seems we should define what user-facing errors are ignored > > anywhere in the system and if we aren’t consistently leveraging these in > > all areas/commands make the necessary qualifications in those specific > > places. > > > > > > I think "left in a deleted state" is also unclear for users because this > > > explains the internal state but not how looks from user's view.How about > > > leaving the explanation "These rows will not be visible or accessible" in > > > the existing statement? > > > > > > > Just visible then, I don’t like an “or” there and as tuples at least they > > are accessible to the system, in vacuum especially. But I expected the > > user to understand “as if you deleted it” as their operational concept more > > readily than visible. I think this will be read by people who haven’t read > > MVCC to fully understand what visible means but know enough to run vacuum > > to clean up updated and deleted data as a rule. > > Ok, I agree we can omit "or accessible". How do you like the followings? > Still redundant? > > "If the command fails, these rows are left in a deleted state; > these rows will not be visible, but they still occupy disk space. " Also, the above statement is used in the patch. Regards, Yugo Nagata > Regards, > Yugo Nagata > > -- > Yugo NAGATA > > -- Yugo NAGATA diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..10cfc3f0ad 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -90,6 +90,13 @@ COPY { table_name [ ( pg_stat_progress_copy view. See for details. + + +By default, COPY will fail if it encounters an error +during processing. For use cases where a best-effort attempt at loading +the entire file is desired, the ON_ERROR clause can +be used to specify some other behavior. + @@ -378,17 +385,20 @@ COPY { table_name [ ( ON_ERROR - Specifies which - error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. - If the stop value is specified, - COPY stops operation at the first error. - If the ignore value is specified, - COPY skips malformed data and continues copying data. - The option is allowed only in COPY FROM. - Only stop value is allowed when - using binary format. + Specifies which how to behave when encountering an error due to column values + unacceptable to the input function of each attribute's data type. + An error_action value of + stop means fail the command, while + ignore means discard the input row and continue with the next one. + The default is stop + + + The ignore option is applicable only for COPY FROM + when the FORMAT is text or csv. + + + An NOTICE level context message containing the ignored row count is + emitted at the end of the COPY FROM
Re: Small fix on COPY ON_ERROR document
On Sun, 28 Jan 2024 19:14:58 -0700 "David G. Johnston" wrote: > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > > COPY FROM raises an error when the number of input column does not match > > to the table schema, but this error is not ignored by ON_ERROR while > > this seems to fall into the category of "invalid input syntax". > > > > It is literally the error text that appears if one were not to ignore it. > It isn’t a category of errors. But I’m open to ideas here. But being > explicit with what on actually sees in the system seemed preferable to > inventing new classification terms not otherwise used. Thank you for explanation! I understood the words was from the error messages that users actually see. However, as Torikoshi-san said in [1], errors other than valid input syntax (e.g. range error) can be also ignored, therefore it would be better to describe to be ignored errors more specifically. [1] https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > > > > > So, keeping consistency with the existing description, we can say: > > > > "Specifies which how to behave when encountering an error due to > > column values unacceptable to the input function of each attribute's > > data type." > > > Yeah, I was considering something along those lines as an option as well. > But I’d rather add that wording to the glossary. Although I am still be not convinced if we have to introduce the words "soft error" to the documentation, I don't care it if there are no other opposite opinions. > > > Currently, ON_ERROR doesn't support other soft errors, so it can explain > > it more simply without introducing the new concept, "soft error" to users. > > > > > Good point. Seems we should define what user-facing errors are ignored > anywhere in the system and if we aren’t consistently leveraging these in > all areas/commands make the necessary qualifications in those specific > places. > > > I think "left in a deleted state" is also unclear for users because this > > explains the internal state but not how looks from user's view.How about > > leaving the explanation "These rows will not be visible or accessible" in > > the existing statement? > > > > Just visible then, I don’t like an “or” there and as tuples at least they > are accessible to the system, in vacuum especially. But I expected the > user to understand “as if you deleted it” as their operational concept more > readily than visible. I think this will be read by people who haven’t read > MVCC to fully understand what visible means but know enough to run vacuum > to clean up updated and deleted data as a rule. Ok, I agree we can omit "or accessible". How do you like the followings? Still redundant? "If the command fails, these rows are left in a deleted state; these rows will not be visible, but they still occupy disk space. " Regards, Yugo Nagata -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On 2024-01-27 00:04, David G. Johnston wrote: On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA wrote: On Fri, 26 Jan 2024 00:00:57 -0700 "David G. Johnston" wrote: I will need to make this tweak and probably a couple others to my own suggestions in 12 hours or so. And here is my v2. Notably I choose to introduce the verbiage "soft error" and then define in the ON_ERROR clause the specific soft error that matters here - "invalid input syntax". I also note the log message behavior when ignore mode is chosen. I haven't confirmed that it is accurate but that is readily tweaked if approved of. David J. Thanks for refining the doc. + Specifies which how to behave when encountering a soft error. To be consistent with other parts in the manual[1][2], should be “soft” error? + An error_action value of + stop means fail the command, while + ignore means discard the input row and continue with the next one. + The default is stop Is "." required at the end of the line? + + The only relevant soft error is "invalid input syntax", which manifests when attempting + to create a column value from the text input. + I think it is not restricted to "invalid input syntax". We can handle out of range error: =# create table t1(i int); CREATE TABLE =# copy t1 from stdin with(ON_ERROR ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 >> \. NOTICE: 1 row was skipped due to data type incompatibility COPY 0 Also, I'm a little concerned that users might wonder what soft error is. Certainly there are already references to "soft" errors in the manual, but they seem to be for developer, such as creating new TYPE for PostgreSQL. It might be better to describe what soft error is like below: -- src/backend/utils/fmgr/README An error reported "softly" must be safe, in the sense that there is no question about our ability to continue normal processing of the transaction. [1] https://www.postgresql.org/docs/devel/sql-createtype.html [2] https://www.postgresql.org/docs/devel/functions-info.html -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Small fix on COPY ON_ERROR document
On Fri, 26 Jan 2024 08:04:45 -0700 "David G. Johnston" wrote: > On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA wrote: > > > On Fri, 26 Jan 2024 00:00:57 -0700 > > "David G. Johnston" wrote: > > > > > I will need to make this tweak and probably a couple others to my own > > > suggestions in 12 hours or so. > > > > > > > > And here is my v2. > > Notably I choose to introduce the verbiage "soft error" and then define in > the ON_ERROR clause the specific soft error that matters here - "invalid > input syntax". I am not sure we should use "soft error" without any explanation because it seems to me that the meaning of words is unclear for users. This is explained in src/backend/utils/fmgr/README; * Some callers of datatype input functions (and in future perhaps other classes of functions) pass an instance of ErrorSaveContext. This indicates that the caller wishes to handle "soft" errors without a transaction-terminating exception being thrown: instead, the callee should store information about the error cause in the ErrorSaveContext struct and return a dummy result value. However, this is not mentioned in the documentation anywhere. So, it would be hard for users to understand what "soft error" is because they would not read README. Also, I think "invalid input syntax" is a bit ambiguous. For example, COPY FROM raises an error when the number of input column does not match to the table schema, but this error is not ignored by ON_ERROR while this seems to fall into the category of "invalid input syntax". I found the description as followings in the documentation in COPY: The column values themselves are strings (...) acceptable to the input function, of each attribute's data type. So, keeping consistency with the existing description, we can say: "Specifies which how to behave when encountering an error due to column values unacceptable to the input function of each attribute's data type." Currently, ON_ERROR doesn't support other soft errors, so it can explain it more simply without introducing the new concept, "soft error" to users. > I also note the log message behavior when ignore mode is chosen. I haven't > confirmed that it is accurate but that is readily tweaked if approved of. > + An INFO level context message containing the ignored row count is + emitted at the end of the COPY FROM if at least one row was discarded. The log level is NOTICE not INFO. -COPY stops operation at the first error when -ON_ERROR is not specified. This -should not lead to problems in the event of a COPY -TO, but the target table will already have received -earlier rows in a COPY FROM. These rows will not -be visible or accessible, but they still occupy disk space. This might +The COPY FROM command physically inserts input rows +into the table as it progresses. If the command fails these rows are +left in a deleted state, still occupying disk space. This might amount to a considerable amount of wasted disk space if the failure -happened well into a large copy operation. You might wish to invoke -VACUUM to recover the wasted space. +happened well into a large copy operation. VACUUM +should be used to recover the wasted space. I think "left in a deleted state" is also unclear for users because this explains the internal state but not how looks from user's view.How about leaving the explanation "These rows will not be visible or accessible" in the existing statement? Regards, Yugo Nagata -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA wrote: > On Fri, 26 Jan 2024 00:00:57 -0700 > "David G. Johnston" wrote: > > > I will need to make this tweak and probably a couple others to my own > > suggestions in 12 hours or so. > > > > And here is my v2. Notably I choose to introduce the verbiage "soft error" and then define in the ON_ERROR clause the specific soft error that matters here - "invalid input syntax". I also note the log message behavior when ignore mode is chosen. I haven't confirmed that it is accurate but that is readily tweaked if approved of. David J. From 2d656fe0a69ea1349472d88794c18f8a2e2d37e9 Mon Sep 17 00:00:00 2001 From: "David G. Johnston" Date: Thu, 25 Jan 2024 23:35:44 -0700 Subject: [PATCH] v2 Improve ON_ERROR verbiage in COPY documentation --- doc/src/sgml/ref/copy.sgml | 48 +++--- 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..38ce0b0a4c 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -90,6 +90,13 @@ COPY { table_name [ ( pg_stat_progress_copy view. See for details. + + +By default, COPY will fail if it encounters an error +during processing. For use cases where a best-effort attempt at loading +the entire file is desired, the ON_ERROR clause can +be used to specify some other behavior. + @@ -378,17 +385,23 @@ COPY { table_name [ ( ON_ERROR - Specifies which - error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. - If the stop value is specified, - COPY stops operation at the first error. - If the ignore value is specified, - COPY skips malformed data and continues copying data. - The option is allowed only in COPY FROM. - Only stop value is allowed when - using binary format. + Specifies which how to behave when encountering a soft error. + An error_action value of + stop means fail the command, while + ignore means discard the input row and continue with the next one. + The default is stop + + + The ignore option is applicable only for COPY FROM + when the FORMAT is text or csv. + + + The only relevant soft error is "invalid input syntax", which manifests when attempting + to create a column value from the text input. + + + An INFO level context message containing the ignored row count is + emitted at the end of the COPY FROM if at least one row was discarded. @@ -576,15 +589,12 @@ COPY count -COPY stops operation at the first error when -ON_ERROR is not specified. This -should not lead to problems in the event of a COPY -TO, but the target table will already have received -earlier rows in a COPY FROM. These rows will not -be visible or accessible, but they still occupy disk space. This might +The COPY FROM command physically inserts input rows +into the table as it progresses. If the command fails these rows are +left in a deleted state, still occupying disk space. This might amount to a considerable amount of wasted disk space if the failure -happened well into a large copy operation. You might wish to invoke -VACUUM to recover the wasted space. +happened well into a large copy operation. VACUUM +should be used to recover the wasted space. -- 2.34.1
Re: Small fix on COPY ON_ERROR document
On Fri, 26 Jan 2024 00:00:57 -0700 "David G. Johnston" wrote: > On Thursday, January 25, 2024, Yugo NAGATA wrote: > > > > > Maybe, we can separate the sentese to two, for example: > > > > COPY stops operation at the first error. (The exception is if the error > > is due to data type incompatibility and a value other than stop is > > specified > > to the ON_ERROR option.) > > > > I’d lean more toward saying: > > Copy from can be instructed to ignore errors that arise from casting input > data to the data types of the target columns by setting the on_error option > to ignore. Skipping the entire input row in the process. > > The last part is because another way to ignore would be to set null values > for those columns. That makes sense. Is is a bit ambiguous to just say "skips malformed data"; it might be unclear for users if the data in the problematic column is skipped (NULL is set) or the entire row is skipped. Also, setting null values for those columns could be a future feature of ON_ERROR option. > > That a command stops on an error is the assumed behavior throughout the > system, writing “copy stops operation at the first error.” just seems > really unnecessary. I think we need to describe this default behavior explicitly somewhere, as you suggested in the other post [1]. [1] https://www.postgresql.org/message-id/CAKFQuwZJZ6uaS2B7qpL2FJzWBsnDdzgtbsW3pH9zuT6vC3fH%2Bg%40mail.gmail.com Regards, Yugo Nagata > I will need to make this tweak and probably a couple others to my own > suggestions in 12 hours or so. > > David J. -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On Thu, 25 Jan 2024 23:33:22 -0700 "David G. Johnston" wrote: > On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA wrote: > > > On Fri, 26 Jan 2024 13:59:09 +0900 > > Masahiko Sawada wrote: > > > > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA > > wrote: > > > > > > > > Hi, > > > > > > > > I found that the documentation of COPY ON_ERROR said > > > > COPY stops operation at the first error when > > > > "ON_ERROR is not specified.", but it also stop when > > > > ON_ERROR is specified to the default value, "stop". > > > > > > > > > I'm finding the current wording surrounding ON_ERROR to not fit in with the > rest of the page. I've tried to improve things by being a bit more > invasive. Introducing the ON_ERROR option changed the behavior of COPY about error handling to some extent, so it might be good to rewrite a bit more parts of the documentation as you suggest. > This first hunk updates the description of the COPY command to include > describing the purpose of the ON_ERROR clause. > > I too am concerned about the word "parsed" here, and "malformed" in the > more detailed descriptions; this would need to be modified to better > reflect the circumstances under which ignore happens. I think "errors due to data type incompatibility" would be nice to describe the errors to be ignored by ON_ERROR, as used in the NOTICE message output. > diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml > index 21a5c4a052..5fe4c9f747 100644 > --- a/doc/src/sgml/ref/copy.sgml > +++ b/doc/src/sgml/ref/copy.sgml > @@ -90,6 +90,14 @@ COPY { class="parameter">table_name [ ( in the pg_stat_progress_copy view. See > for details. > > + > + > +By default, COPY will abort if it encounters errors. > +For non-binary format COPY FROM the user can specify > +to instead ignore input rows that cannot be parsed by specifying > +the ignore option to the ON_ERROR > +clause. > + > > > > The following was, IMO, too much text about something not to worry about, > COPY TO and ignore mode. The point is dead tuples on error and running > vacuum. It doesn't really even matter what caused the error, though if the > error was even before rows started to be imported then obviously there > would be no dead tuples. > > Oh, and the word "first" really isn't adding anything here that we cannot > reasonably leave to common sense, IMO. We either ignore all errors or stop > on the first one. There isn't a stop mode that is capable of bypassing > errors and the ignore mode doesn't have a "first n" option so one assumes > all errors are ignored. > > @@ -576,15 +583,12 @@ COPY class="parameter">count > > > > -COPY stops operation at the first error when > -ON_ERROR is not specified. This > -should not lead to problems in the event of a COPY > -TO, but the target table will already have received > -earlier rows in a COPY FROM. These rows will not > -be visible or accessible, but they still occupy disk space. This might > -amount to a considerable amount of wasted disk space if the failure > -happened well into a large copy operation. You might wish to invoke > -VACUUM to recover the wasted space. > +A failed COPY FROM command will have performed > +physical insertion of all rows prior to the malformed one. As you said that it does not matter what caused the error, I don't think we have to use "malformed" here. Instead, we could say "we will have performed physical insertion of rows before the failure." > +While these rows will not be visible or accessible, they still occupy > +disk space. This might amount to a considerable amount of wasted disk > +space if the failure happened well into a large copy operation. > +VACUUM should be used to recover the wasted space. > > > > > David J. Regards, Yugo Nagata -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On Thursday, January 25, 2024, Yugo NAGATA wrote: > > Maybe, we can separate the sentese to two, for example: > > COPY stops operation at the first error. (The exception is if the error > is due to data type incompatibility and a value other than stop is > specified > to the ON_ERROR option.) > I’d lean more toward saying: Copy from can be instructed to ignore errors that arise from casting input data to the data types of the target columns by setting the on_error option to ignore. Skipping the entire input row in the process. The last part is because another way to ignore would be to set null values for those columns. That a command stops on an error is the assumed behavior throughout the system, writing “copy stops operation at the first error.” just seems really unnecessary. I will need to make this tweak and probably a couple others to my own suggestions in 12 hours or so. David J.
Re: Small fix on COPY ON_ERROR document
On Fri, 26 Jan 2024 15:26:55 +0900 Masahiko Sawada wrote: > On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA wrote: > > > > On Fri, 26 Jan 2024 13:59:09 +0900 > > Masahiko Sawada wrote: > > > > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA wrote: > > > > > > > > Hi, > > > > > > > > I found that the documentation of COPY ON_ERROR said > > > > COPY stops operation at the first error when > > > > "ON_ERROR is not specified.", but it also stop when > > > > ON_ERROR is specified to the default value, "stop". > > > > > > > > I attached a very small patch to fix this just for > > > > making the description more accurate. > > > > > > Thank you for the patch! > > > > > > +1 to fix it. > > > > > > -ON_ERROR is not specified. This > > > -should not lead to problems in the event of a COPY > > > +ON_ERROR is not specified or > > > stop. > > > +This should not lead to problems in the event of a COPY > > > > > > How about the followings for consistency with the description of the > > > ON_ERROR option? > > > > > > COPY stops operation at the first error if the stop value is specified > > > to the ON_ERROR option. This should not lead to ... > > > > Thank you for you review. However, after posting the previous patch, > > I noticed that I overlooked ON_ERROR works only for errors due to data > > type incompatibility (that is mentioned as "malformed data" in the > > ON_ERROR description, though). > > Right. > > > > > So, how about rewriting this like: > > > > COPY stops operation at the first error unless the error is due to data > > type incompatibility and a value other than stop is specified to the > > ON_ERROR option. > > Hmm, this sentence seems not very readable to me, especially the "COPY > stops ... unless ... a value other than stop is specified ..." part. I > think we can simplify it: I can agree with your opinion on the readability, but... > COPY stops operation at the first data type incompatibility error if > the stop value is specified to the ON_ERROR option. This statement doesn't explain COPY also stops when an error other than data type incompatibility (e.g. constrain violations) occurs. Maybe, we can separate the sentese to two, for example: COPY stops operation at the first error. (The exception is if the error is due to data type incompatibility and a value other than stop is specified to the ON_ERROR option.) Regards, Yugo Nagata > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Yugo NAGATA
Re: Small fix on COPY ON_ERROR document
On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA wrote: > On Fri, 26 Jan 2024 13:59:09 +0900 > Masahiko Sawada wrote: > > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA > wrote: > > > > > > Hi, > > > > > > I found that the documentation of COPY ON_ERROR said > > > COPY stops operation at the first error when > > > "ON_ERROR is not specified.", but it also stop when > > > ON_ERROR is specified to the default value, "stop". > > > > > I'm finding the current wording surrounding ON_ERROR to not fit in with the rest of the page. I've tried to improve things by being a bit more invasive. This first hunk updates the description of the COPY command to include describing the purpose of the ON_ERROR clause. I too am concerned about the word "parsed" here, and "malformed" in the more detailed descriptions; this would need to be modified to better reflect the circumstances under which ignore happens. diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..5fe4c9f747 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -90,6 +90,14 @@ COPY { table_name [ ( pg_stat_progress_copy view. See for details. + + +By default, COPY will abort if it encounters errors. +For non-binary format COPY FROM the user can specify +to instead ignore input rows that cannot be parsed by specifying +the ignore option to the ON_ERROR +clause. + The use of the word "Currently" stood out to me when reading the next hunk. We always document current reality and don't call out that fact. We do not imply some future feature may change how this all works. On a related note, right now we have stop and ignore, which are basically enums just like we have for format (csv, text, binary). Unlike format, though, this option requires single quotes. Why is that? I did keep with not specifying the enum in the main syntax block since format doesn't as well; though writing { stop | ignore } seemed quite appealing. The rest of the page indicates what the default is in a sentence by itself, not as a parenthetical next to the option. The command limitations seemed worthy of a separate paragraph, though it repeats the description which isn't great. I'm going to sleep on this one. @@ -379,17 +387,16 @@ COPY { table_name [ ( Specifies which - error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. - If the stop value is specified, - COPY stops operation at the first error. - If the ignore value is specified, - COPY skips malformed data and continues copying data. - The option is allowed only in COPY FROM. - Only stop value is allowed when - using binary format. + error_action to perform when encountering a malformed input row: + stop or ignore. + A value of stop means fail the command, while + ignore means discard the input row and continue with the next one. + The default is stop + + The ignore option is applicable only for COPY FROM + when the FORMAT is text + or csv. + The following was, IMO, too much text about something not to worry about, COPY TO and ignore mode. The point is dead tuples on error and running vacuum. It doesn't really even matter what caused the error, though if the error was even before rows started to be imported then obviously there would be no dead tuples. Oh, and the word "first" really isn't adding anything here that we cannot reasonably leave to common sense, IMO. We either ignore all errors or stop on the first one. There isn't a stop mode that is capable of bypassing errors and the ignore mode doesn't have a "first n" option so one assumes all errors are ignored. @@ -576,15 +583,12 @@ COPY count -COPY stops operation at the first error when -ON_ERROR is not specified. This -should not lead to problems in the event of a COPY -TO, but the target table will already have received -earlier rows in a COPY FROM. These rows will not -be visible or accessible, but they still occupy disk space. This might -amount to a considerable amount of wasted disk space if the failure -happened well into a large copy operation. You might wish to invoke -VACUUM to recover the wasted space. +A failed COPY FROM command will have performed +physical insertion of all rows prior to the malformed one. +While these rows will not be visible or accessible, they still occupy +disk space. This might amount to a considerable amount of wasted disk +space if the failure happened well into a large copy operation. +VACUUM should be used to recover the wasted space. David J.
Re: Small fix on COPY ON_ERROR document
On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA wrote: > > On Fri, 26 Jan 2024 13:59:09 +0900 > Masahiko Sawada wrote: > > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA wrote: > > > > > > Hi, > > > > > > I found that the documentation of COPY ON_ERROR said > > > COPY stops operation at the first error when > > > "ON_ERROR is not specified.", but it also stop when > > > ON_ERROR is specified to the default value, "stop". > > > > > > I attached a very small patch to fix this just for > > > making the description more accurate. > > > > Thank you for the patch! > > > > +1 to fix it. > > > > -ON_ERROR is not specified. This > > -should not lead to problems in the event of a COPY > > +ON_ERROR is not specified or > > stop. > > +This should not lead to problems in the event of a COPY > > > > How about the followings for consistency with the description of the > > ON_ERROR option? > > > > COPY stops operation at the first error if the stop value is specified > > to the ON_ERROR option. This should not lead to ... > > Thank you for you review. However, after posting the previous patch, > I noticed that I overlooked ON_ERROR works only for errors due to data > type incompatibility (that is mentioned as "malformed data" in the > ON_ERROR description, though). Right. > > So, how about rewriting this like: > > COPY stops operation at the first error unless the error is due to data > type incompatibility and a value other than stop is specified to the > ON_ERROR option. Hmm, this sentence seems not very readable to me, especially the "COPY stops ... unless ... a value other than stop is specified ..." part. I think we can simplify it: COPY stops operation at the first data type incompatibility error if the stop value is specified to the ON_ERROR option. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Small fix on COPY ON_ERROR document
On Fri, 26 Jan 2024 13:59:09 +0900 Masahiko Sawada wrote: > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA wrote: > > > > Hi, > > > > I found that the documentation of COPY ON_ERROR said > > COPY stops operation at the first error when > > "ON_ERROR is not specified.", but it also stop when > > ON_ERROR is specified to the default value, "stop". > > > > I attached a very small patch to fix this just for > > making the description more accurate. > > Thank you for the patch! > > +1 to fix it. > > -ON_ERROR is not specified. This > -should not lead to problems in the event of a COPY > +ON_ERROR is not specified or stop. > +This should not lead to problems in the event of a COPY > > How about the followings for consistency with the description of the > ON_ERROR option? > > COPY stops operation at the first error if the stop value is specified > to the ON_ERROR option. This should not lead to ... Thank you for you review. However, after posting the previous patch, I noticed that I overlooked ON_ERROR works only for errors due to data type incompatibility (that is mentioned as "malformed data" in the ON_ERROR description, though). So, how about rewriting this like: COPY stops operation at the first error unless the error is due to data type incompatibility and a value other than stop is specified to the ON_ERROR option. I attached the updated patch. Regards, Yugo Nagata > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Yugo NAGATA diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..3676bf0e87 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -576,9 +576,10 @@ COPY count -COPY stops operation at the first error when -ON_ERROR is not specified. This -should not lead to problems in the event of a COPY +COPY stops operation at the first error unless the error +is due to data type incompatibility and a value other than +stop is specified to the ON_ERROR option. +This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might
Re: Small fix on COPY ON_ERROR document
On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA wrote: > > Hi, > > I found that the documentation of COPY ON_ERROR said > COPY stops operation at the first error when > "ON_ERROR is not specified.", but it also stop when > ON_ERROR is specified to the default value, "stop". > > I attached a very small patch to fix this just for > making the description more accurate. Thank you for the patch! +1 to fix it. -ON_ERROR is not specified. This -should not lead to problems in the event of a COPY +ON_ERROR is not specified or stop. +This should not lead to problems in the event of a COPY How about the followings for consistency with the description of the ON_ERROR option? COPY stops operation at the first error if the stop value is specified to the ON_ERROR option. This should not lead to ... Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com