Re: Allow COPY's 'text' format to output a header
On Fri, Aug 17, 2018 at 01:39:11PM +0900, Michael Paquier wrote: > The point about the header matching mentioned upthread is quite > interesting as it could make the proposed feature way more useful, and > it has not really been discussed. As far as I can see this adds more > sanity checks in NextCopyFromRawFields(). I'd like to think that this > should be a completely different option, say CHECK_HEADER, as CSV simply > skips the header in COPY FROM if specified on HEAD. It has been a couple of weeks since the last review, which has not been addressed, so I am marking the patch as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: Allow COPY's 'text' format to output a header
On Thu, Aug 09, 2018 at 10:37:28AM -0400, Cynthia Shang wrote: > This patch looks good. I realized I should have changed the status > back while we were discussing all this. It is now (and still is) ready > for committer. I have some comments. -ERROR: COPY HEADER available only in CSV mode +ERROR: cannot specify HEADER in BINARY mode This should read "COPY HEADER not available in BINARY mode" perhaps? +copy copytest3 from stdin csv header; +copy copytest3 to stdout csv header; It would be more interesting to first export the data into the file with a header, truncate the relation, and import it back with again header specified. The data of the original should match the new, for both text and csv format. CopyStateData defines header_line, which still assumes that only CSV is supported. Why are there no additional tests for file_fdw? The point about the header matching mentioned upthread is quite interesting as it could make the proposed feature way more useful, and it has not really been discussed. As far as I can see this adds more sanity checks in NextCopyFromRawFields(). I'd like to think that this should be a completely different option, say CHECK_HEADER, as CSV simply skips the header in COPY FROM if specified on HEAD. -- Michael signature.asc Description: PGP signature
Re: Allow COPY's 'text' format to output a header
> On Aug 8, 2018, at 2:57 PM, Simon Muller wrote: > > If there's a merge conflict against master, then it'd be good for an > updated patch to be posted. > > Thanks! > > Stephen > > Attached is an updated patch that should directly apply against current > master. > > -- > Simon Muller > > This patch looks good. I realized I should have changed the status back while we were discussing all this. It is now (and still is) ready for committer. Thanks, -Cynthia
Re: Allow COPY's 'text' format to output a header
On 6 August 2018 at 16:34, Stephen Frost wrote: > Greetings, > > * Cynthia Shang (cynthia.sh...@crunchydata.com) wrote: > > I was able to apply the patch (after resolving a merge conflict which > was expected given an update in master). All looks good. > > If there's a merge conflict against master, then it'd be good for an > updated patch to be posted. > > Thanks! > > Stephen > Attached is an updated patch that should directly apply against current master. -- Simon Muller text_header_v6.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
Greetings, * Cynthia Shang (cynthia.sh...@crunchydata.com) wrote: > > On Aug 2, 2018, at 3:30 PM, Simon Muller wrote: > > > > Sure, thanks both for the feedback. Attached is a patch with the error kept > > as ERRCODE_FEATURE_NOT_SUPPORTED. > > I was able to apply the patch (after resolving a merge conflict which was > expected given an update in master). All looks good. If there's a merge conflict against master, then it'd be good for an updated patch to be posted. Thanks! Stephen signature.asc Description: PGP signature
Re: Allow COPY's 'text' format to output a header
> On Aug 2, 2018, at 3:30 PM, Simon Muller wrote: > > Sure, thanks both for the feedback. Attached is a patch with the error kept > as ERRCODE_FEATURE_NOT_SUPPORTED. > I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looks good. -Cynthia
Re: Allow COPY's 'text' format to output a header
On 2 August 2018 at 17:07, Cynthia Shang wrote: > > > On Aug 2, 2018, at 8:11 AM, Daniel Verite > wrote: > > > > That makes sense, thanks for elaborating, although there are also > > a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c > > that are raised on forbidden/nonsensical combination of features, > > so the consistency argument could work both ways. > > > > If there is not a strong reason to change the error code, then I believe > we should not. The error is the same as it was before, just narrower in > scope. > > Best, > -Cynthia Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED. -- Simon Muller text_header_v5.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
> On Aug 2, 2018, at 8:11 AM, Daniel Verite wrote: > > That makes sense, thanks for elaborating, although there are also > a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c > that are raised on forbidden/nonsensical combination of features, > so the consistency argument could work both ways. > If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was before, just narrower in scope. Best, -Cynthia
Re: Allow COPY's 'text' format to output a header
Simon Muller wrote: > I changed the error type and message for consistency with other similar > errors in that file. Whenever options are combined that are incompatible, > it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown. That makes sense, thanks for elaborating, although there are also a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c that are raised on forbidden/nonsensical combination of features, so the consistency argument could work both ways. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Allow COPY's 'text' format to output a header
On 1 August 2018 at 17:18, Cynthia Shang wrote: > > > On Aug 1, 2018, at 10:20 AM, Daniel Verite > wrote: > > > > /* Check header */ > > - if (!cstate->csv_mode && cstate->header_line) > > + if (cstate->binary && cstate->header_line) > > ereport(ERROR, > > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > -errmsg("COPY HEADER available only in CSV mode"))); > > + (errcode(ERRCODE_SYNTAX_ERROR), > > + errmsg("cannot specify HEADER in BINARY mode"))); > > > > Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR? > > > > I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also > suggest the message read "COPY HEADER not available in BINARY mode", > although I'm pretty agnostic on the latter. > > Regards, > -Cynthia Shang I changed the error type and message for consistency with other similar errors in that file. Whenever options are combined that are incompatible, it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown. For instance, in case you both specify a specific DELIMITER but also declare the format as BINARY, then there is this code in that same file: if (cstate->binary && cstate->delim) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("cannot specify DELIMITER in BINARY mode"))); HEADER seems very similar to me since, like DELIMITER, it makes sense for the textual formats such as CSV and TEXT, but doesn't make sense with the BINARY format. ERRCODE_FEATURE_NOT_SUPPORTED previously made sense since the only reason TEXT and HEADER weren't compatible options was because the feature was not yet implemented, but now ERRCODE_SYNTAX_ERROR seems to make sense to me since I can't foresee a use case where BINARY and HEADER would ever be compatible options. -- Simon Muller
Re: Allow COPY's 'text' format to output a header
Simon Muller wrote: > I've incorporated both your suggestions and included the patch you provided > in the attached patch. Hope it's as expected. Still unconvinced about the use case, since COPY's text format is only meant to be consumed by Postgres, and the only way that Postgres will consume this header is to discard it (at least as of the current patch). But anyway... /* Check header */ - if (!cstate->csv_mode && cstate->header_line) + if (cstate->binary && cstate->header_line) ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), -errmsg("COPY HEADER available only in CSV mode"))); +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot specify HEADER in BINARY mode"))); Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Allow COPY's 'text' format to output a header
> On Jul 25, 2018, at 6:09 PM, Simon Muller wrote: > > I've incorporated both your suggestions and included the patch you provided > in the attached patch. Hope it's as expected. > -- > Simon Muller > > Reviewed and retested. Changing status to Ready for Committer.
Re: Allow COPY's 'text' format to output a header
On 25 July 2018 at 19:24, Cynthia Shang wrote: > > I've reviewed this patch and feel this patch addresses the original ask. I > tested it manually trying to break it and, as mentioned previously, it's > behavior is the same as the CSV copy with regards to it's shortcomings. > However, I feel > 1) a "copy from" test is needed and > 2) the current "copy to" test is (along with a few others) in the wrong > file. > > With regards to #2, the copy.source tests are for things requiring > replacement when running the tests. Given that these copy tests do not, I > have moved the current last set of copy tests to the copy2.sql file and > have provided an attached patch. > > Thanks for reviewing the patch. I agree that moving those previous and these new tests out of the .source files seems to make more sense as they don't make use of the preprocessing/replacement feature. With regards to #1, the patch I have provided can then be used and the > following added as the COPY TO/FROM tests (perhaps after line 426 of the > attached copy2.sql file). Note that I moved the FROM test before the TO > test and omitted the "(format text, header true)" in the FROM test since it > is another way the command can be invoked. > > copy copytest3 from stdin header; > this is just a line full of junk that would error out if parsed > 11 a 1 > 22 b 2 > \. > > copy copytest3 to stdout with (format text, header true); > > > I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as expected. > As for the matching check of the header in the discussion of this patch, I > feel that is a separate patch that can be added later since it would affect > the general functionality of the copy command, not just the ability to have > a text header. > > Best, > - Cynthia Shang > P.S. I did receive the first attached patch, but on my Ubuntu I had to apply it using "git apply --ignore-space-change --ignore-whitespace", probably due to line ending differences. -- Simon Muller text_header_v4.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
On Wed, Jul 25, 2018 at 1:24 PM, Cynthia Shang wrote: > With regards to #2, the copy.source tests are for things requiring > replacement when running the tests. Given that these copy tests do not, I > have moved the current last set of copy tests to the copy2.sql file and have > provided an attached patch. The patch appears in the RAW and in your email (hopefully) but it doesn't appear in the thread archive so I am reattaching from a different email client. move-copy-tests-v1.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
On 4 July 2018 at 22:44, Simon Mullerwrote:I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").This v2 patch should fix that.This patch just fixes a newline issue introduced in my previous patch.I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel 1) a "copy from" test is needed and 2) the current "copy to" test is (along with a few others) in the wrong file. With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch. With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.copy copytest3 from stdin header;this is just a line full of junk that would error out if parsed11 a 122 b 2\.copy copytest3 to stdout with (format text, header true);As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header. move-copy-tests-v1.patch Description: Binary data Best,- Cynthia Shang
Re: Allow COPY's 'text' format to output a header
On 4 July 2018 at 22:44, Simon Muller wrote: > I noticed through the patch tester link at http://commitfest.cputube.org/ > that my patch caused a file_fdw test to fail (since I previously tested > only with "make check" and not with "make check-world"). > > This v2 patch should fix that. > This patch just fixes a newline issue introduced in my previous patch. text_header_v3.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
On 14 May 2018 at 08:35, Simon Muller wrote: > Okay, I've added this to the next commitfest at > https://commitfest.postgresql.org/18/1629/. > > Thanks both Michael and David for the feedback so far. > I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world"). This v2 patch should fix that. text_header_v2.patch Description: Binary data
Re: Allow COPY's 'text' format to output a header
Isaac Morland wrote: > Just to be clear, we're talking about my "header match" feature, not the > basic idea of allowing a header in text format? For my reply it was on merely allowing it, as does the current patch at https://commitfest.postgresql.org/18/1629 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Allow COPY's 'text' format to output a header
On Tuesday, May 15, 2018, Tom Lanewrote: > > > AFAICS, Daniel's just reacting to the basic idea of a header line. > I agree that by itself that's not worth much. However, if we added > your proposed option to insist that the column names match during COPY > IN, I think that that could have some value. > I'm fine for adding it without the added matching behavior, though turning the boolean into an enum is appealing. HEADER { true | false | match } Though we'd need to accept all variants of Boolean for compatability... I'm of the opinion that text and csv should be the same excepting their defaults for some of the options. David J.
Re: Allow COPY's 'text' format to output a header
Isaac Morlandwrites: > On 15 May 2018 at 10:26, Daniel Verite wrote: >> Andrew Dunstan wrote: >>> I'm not necessarily opposed to this, but I'm not certain about the use >>> case either. >> The downside is that it would create the need, when using COPY TO, >> to know whether an input file was generated with or without header, >> and a hazard on mistakes. >> If you say it was and it wasn't, you quietly loose the first row of data. >> If you say it wasn't and in fact it was, either there's a >> datatype mismatch or you quietly get a spurious row of data. > Just to be clear, we're talking about my "header match" feature, not the > basic idea of allowing a header in text format? AFAICS, Daniel's just reacting to the basic idea of a header line. I agree that by itself that's not worth much. However, if we added your proposed option to insist that the column names match during COPY IN, I think that that could have some value. It would allow forestalling one common type of pilot error, ie copying the wrong file entirely. (It'd also prevent copying in data that has the wrong column order, but I think that's a less common scenario. I might be wrong about that.) > One can imagine extensions of the idea: for example, the header could > actually be used to identify the columns, so the column order in the file > doesn't matter. There could also be an "AS" syntax to allow the target > field names to be different from the field names in the header. I have > occasionally found myself wanting to ignore certain columns of the file. > But these are all significantly more complicated than just looking at the > header and requiring it to match the target field names. Yeah, and every bit of flexibility you add raises the chance of an undetected error. COPY isn't intended as a general ETL facility, so I'd mostly be -1 on adding such things. But I can see the value of confirming that you're copying the right file, and a header match check would go a long way towards doing that. regards, tom lane
Re: Allow COPY's 'text' format to output a header
Andrew Dunstan wrote: > I'm not necessarily opposed to this, but I'm not certain about the use > case either. +1. The downside is that it would create the need, when using COPY TO, to know whether an input file was generated with or without header, and a hazard on mistakes. If you say it was and it wasn't, you quietly loose the first row of data. If you say it wasn't and in fact it was, either there's a datatype mismatch or you quietly get a spurious row of data. This complication should be balanced by some advantage. What can we do with the header? If you already have the table ready to COPY in, you don't need that information. The only reason why COPY TO needs to know about the header is to throw it away. And if you don't have the table created yet, a header with just the column names is hardly sufficient to create it, isn't it? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Allow COPY's 'text' format to output a header
On Mon, May 14, 2018 at 04:08:47PM -0400, Isaac Morland wrote: > While we're discussing COPY options, what do people think of an option for > COPY FROM with header to require that the headers match the target column > names? This would help to ensure that the file is actually the right one. I am personally not much into such sanity check logics in COPY FWIW if we can live without. -- Michael signature.asc Description: PGP signature
Re: Allow COPY's 'text' format to output a header
While we're discussing COPY options, what do people think of an option for COPY FROM with header to require that the headers match the target column names? This would help to ensure that the file is actually the right one. On 14 May 2018 at 14:55, David G. Johnstonwrote: > On Mon, May 14, 2018 at 11:44 AM, Garick Hamlin > wrote: > >> I wonder if there was a way to let COPY FROM detect or ignore headers >> > as appropriate and rather than cause silently result in headers being >> added as data. >> > > Not reliably > > >> Maybe a blank line after the header line could prevent this confusion >> > > No > > +1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY > and even if I were to agree that CSV format is the better one this seems > like an unnecessary area to impose preferences. If TSV with Header meets > someone's need providing a minimal (and consistent with expectations) > syntax to accomplish that goal seems reasonable, as does the patch. > > David J. > >
Re: Allow COPY's 'text' format to output a header
On Mon, May 14, 2018 at 11:44 AM, Garick Hamlinwrote: > I wonder if there was a way to let COPY FROM detect or ignore headers > as appropriate and rather than cause silently result in headers being > added as data. > Not reliably > Maybe a blank line after the header line could prevent this confusion > No +1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY and even if I were to agree that CSV format is the better one this seems like an unnecessary area to impose preferences. If TSV with Header meets someone's need providing a minimal (and consistent with expectations) syntax to accomplish that goal seems reasonable, as does the patch. David J.
Re: Allow COPY's 'text' format to output a header
On Mon, May 14, 2018 at 09:37:07AM +0900, Michael Paquier wrote: > On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote: > > This patch makes sense to me and looks reasonable. > > One "potential" problem is if a relation has a full set of column which > allows the input of text-like data: if the header has been added with > COPY TO, and that the user forgets to add again the header option with > COPY FROM, then an extra row will be generated but there is the same > problem with CSV format :) Yeah, I wonder if that can be addressed. I wonder if there was a way to let COPY FROM detect or ignore headers as appropriate and rather than cause silently result in headers being added as data. Maybe a blank line after the header line could prevent this confusion? Garick
Re: Allow COPY's 'text' format to output a header
On 05/14/2018 02:35 AM, Simon Muller wrote: > Okay, I've added this to the next commitfest at > https://commitfest.postgresql.org/18/1629/. > > Thanks both Michael and David for the feedback so far. > > (Please don't top-post on PostgreSQL lists.) I'm not necessarily opposed to this, but I'm not certain about the use case either. The original request seemed to stem from a false impression that CSV mode can't produce or consume tab-delimited files. But it can, and in fact it's saner for almost all uses than text format. Postgres' text format is really intended for Postgres' use. CSV format is more appropriate for dealing with external programs, whether the delimiter be a tab or a comma. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Allow COPY's 'text' format to output a header
Okay, I've added this to the next commitfest at https://commitfest.postgresql.org/18/1629/. Thanks both Michael and David for the feedback so far. On 14 May 2018 at 02:37, Michael Paquierwrote: > On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote: > > This patch makes sense to me and looks reasonable. > > One "potential" problem is if a relation has a full set of column which > allows the input of text-like data: if the header has been added with > COPY TO, and that the user forgets to add again the header option with > COPY FROM, then an extra row will be generated but there is the same > problem with CSV format :) > > One comment I have about the patch is that there is no test for > COPY FROM with an output file which has a header. In this case if > HEADER is true then the file can be loaded. If HEADER is wrong, an > error should normally be raised because of the format (well, let's > discard the case of the relation with text-only columns). So the tests > could be extended a bit even for CSV. > > > We're in the middle of a feature freeze that will last most of the > > summer, so be sure to enter your patch into the next commitfest so it > > can be considered when the freeze is over. > > > > https://commitfest.postgresql.org/18/ > > Yes, you will need to be patient a couple of months here. > -- > Michael >
Re: Allow COPY's 'text' format to output a header
On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote: > This patch makes sense to me and looks reasonable. One "potential" problem is if a relation has a full set of column which allows the input of text-like data: if the header has been added with COPY TO, and that the user forgets to add again the header option with COPY FROM, then an extra row will be generated but there is the same problem with CSV format :) One comment I have about the patch is that there is no test for COPY FROM with an output file which has a header. In this case if HEADER is true then the file can be loaded. If HEADER is wrong, an error should normally be raised because of the format (well, let's discard the case of the relation with text-only columns). So the tests could be extended a bit even for CSV. > We're in the middle of a feature freeze that will last most of the > summer, so be sure to enter your patch into the next commitfest so it > can be considered when the freeze is over. > > https://commitfest.postgresql.org/18/ Yes, you will need to be patient a couple of months here. -- Michael signature.asc Description: PGP signature
Re: Allow COPY's 'text' format to output a header
Hi Simon, On 5/13/18 6:18 PM, Simon Muller wrote: > This patch adds the capability to use the HEADER feature with the "text" > format of the COPY command. The patch includes the related update to > documentation and an additional regression test for this feature. > > Currently you can only add a header line (which lists the column names) > when exporting with COPY to the CSV format, but I much prefer using the > default "text" format. This feature is also currently listed on the > to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems > to have been requested some years ago. > > Hopefully I've done everything correctly and the patch is acceptable > enough to be considered for application. This patch makes sense to me and looks reasonable. We're in the middle of a feature freeze that will last most of the summer, so be sure to enter your patch into the next commitfest so it can be considered when the freeze is over. https://commitfest.postgresql.org/18/ Regards, -- -David da...@pgmasters.net
Allow COPY's 'text' format to output a header
This patch adds the capability to use the HEADER feature with the "text" format of the COPY command. The patch includes the related update to documentation and an additional regression test for this feature. Currently you can only add a header line (which lists the column names) when exporting with COPY to the CSV format, but I much prefer using the default "text" format. This feature is also currently listed on the to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems to have been requested some years ago. Hopefully I've done everything correctly and the patch is acceptable enough to be considered for application. Simon Muller 0001-Allow-COPY-s-text-format-to-output-a-header.patch Description: Binary data