Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-03 Thread Bruce Momjian
Patch applied. Thanks. --- Andrew Dunstan wrote: I wrote: If it bothers you that much. I'd make a flag, cleared at the start of each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, if

Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-02 Thread Andrew Dunstan
I wrote: If it bothers you that much. I'd make a flag, cleared at the start of each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, if the flag is not set then set it and issue the warning. I didn't realise until Bruce told me just now that I was on the hook for this. I

Re: [HACKERS] multiline CSV fields

2004-12-02 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: + if (!embedded_line_warning (c == '\n' || c == '\r') ) + { + embedded_line_warning = true; + elog(WARNING, + CSV fields with embedded linefeed or carriage

Re: [HACKERS] multiline CSV fields

2004-12-02 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: + if (!embedded_line_warning (c == '\n' || c == '\r') ) + { + embedded_line_warning = true; + elog(WARNING, + CSV fields with embedded linefeed or carriage return + characters might not be able to be reimported); +

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Kris Jurka
On Tue, 30 Nov 2004, Greg Stark wrote: Andrew Dunstan [EMAIL PROTECTED] writes: The advantage of having it in COPY is that it can be done serverside direct from the file system. For massive bulk loads that might be a plus, although I don't know what the protocol+socket overhead is.

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
Greg Stark wrote: Personally I find the current CSV support inadequate. It seems pointless to support CSV if it can't load data exported from Excel, which seems like the main use case. OK, I'm starting to get mildly annoyed now. We have identified one failure case connected with multiline

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Ben . Young
Greg Stark wrote: Personally I find the current CSV support inadequate. It seems pointless to support CSV if it can't load data exported from Excel, which seems like the main use case. OK, I'm starting to get mildly annoyed now. We have identified one failure case connected

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
[EMAIL PROTECTED] wrote: I am normally more of a lurker on these lists, but I thought you had better know that when we developed CSV import/export for an application at my last company we discovered that Excel can't always even read the CSV that _it_ has output! (With embedded newlines a

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Greg Stark
Andrew Dunstan [EMAIL PROTECTED] writes: FWIW, I don't make a habit of using multiline fields in my spreadsheets - and some users I have spoken to aren't even aware that you can have them at all. Unfortunately I don't get a choice. I offer a field on the web site where users can upload an

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote: Greg Stark wrote: Personally I find the current CSV support inadequate. It seems pointless to support CSV if it can't load data exported from Excel, which seems like the main use case. OK, I'm starting to get mildly annoyed now. We have identified one

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
Bruce Momjian wrote: I am wondering if one good solution would be to pre-process the input stream in copy.c to convert newline to \n and carriage return to \r and double data backslashes and tell copy.c to interpret those like it does for normal text COPY files. That way, the changes to copy.c

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote: Bruce Momjian wrote: I am wondering if one good solution would be to pre-process the input stream in copy.c to convert newline to \n and carriage return to \r and double data backslashes and tell copy.c to interpret those like it does for normal text COPY files.

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies elsewhere. Sure, pg_dump doesn't use it but COPY should be able to load anything it

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies elsewhere. Sure, pg_dump doesn't use it but COPY should be

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies elsewhere. Sure, pg_dump

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Andrew Dunstan wrote: OK, then should we disallow dumping out data in CVS format that we can't load? Seems like the least we should do for 8.0. As Tom rightly points out, having data make the round trip was not the goal of the exercise. Excel, for example, has no trouble reading

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Bruce Momjian wrote: Andrew Dunstan wrote: OK, then should we disallow dumping out data in CVS format that we can't load? Seems like the least we should do for 8.0. As Tom rightly points out, having data make the round trip was not the goal of the exercise. Excel, for

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Andrew Dunstan wrote: OK, then should we disallow dumping out data in CVS format that we can't load? Seems like the least we should do for 8.0. As Tom rightly points out, having data make the round trip was not the goal of the exercise. Excel, for example, has no

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Also, can you explain why we can't read across a newline to the next quote? Is it a problem with the way our code is structured or is it a logical problem? Someone mentioned multibyte encodings but I don't understand how that applies here. In a CSV file, each line is a

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Also, can you explain why we can't read across a newline to the next quote? Is it a problem with the way our code is structured or is it a logical problem? It's a structural issue in the sense that we separate the act of dividing the input into rows

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Kris Jurka
On Mon, 29 Nov 2004, Andrew Dunstan wrote: Longer term I'd like to be able to have a command parameter that specifies certain fields as multiline and for those relax the line end matching restriction (and for others forbid multiline altogether). That would be a TODO for 8.1 though, along

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Kris Jurka wrote: On Mon, 29 Nov 2004, Andrew Dunstan wrote: Longer term I'd like to be able to have a command parameter that specifies certain fields as multiline and for those relax the line end matching restriction (and for others forbid multiline altogether). That would be a

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes: Endlessly extending the COPY command doesn't seem like a winning proposition to me and I think if we aren't comfortable telling every user to write a script to pre/post-process the data we should instead provide a bulk loader/unloader that transforms

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Tom Lane wrote: Kris Jurka [EMAIL PROTECTED] writes: Endlessly extending the COPY command doesn't seem like a winning proposition to me and I think if we aren't comfortable telling every user to write a script to pre/post-process the data we should instead provide a bulk

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Tom Lane wrote: Kris Jurka [EMAIL PROTECTED] writes: Endlessly extending the COPY command doesn't seem like a winning proposition to me and I think if we aren't comfortable telling every user to write a script to pre/post-process the data we should instead provide a bulk loader/unloader

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Greg Stark
Andrew Dunstan [EMAIL PROTECTED] writes: The advantage of having it in COPY is that it can be done serverside direct from the file system. For massive bulk loads that might be a plus, although I don't know what the protocol+socket overhead is. Actually even if you use client-side COPY it's

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian
OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. I have added this to the open items list: * fix COPY CSV with \r,\n in data My feeling is that if we are in a quoted string we just process whatever characters we find, even passing through an

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Andrew Dunstan
Bruce Momjian said: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the

Re: [HACKERS] multiline CSV fields

2004-11-12 Thread Andrew Dunstan
This example should fail on data line 2 or 3 on any platform, regardless of the platform's line-end convention, although I haven't tested on Windows. cheers andrew [EMAIL PROTECTED] inst]$ bin/psql -e -f csverr.sql ; od -c /tmp/csverrtest.csv create table csverrtest (a int, b text, c int);

Re: [HACKERS] multiline CSV fields

2004-11-12 Thread Patrick B Kelly
On Nov 12, 2004, at 12:20 AM, Tom Lane wrote: Patrick B Kelly [EMAIL PROTECTED] writes: I may not be explaining myself well or I may fundamentally misunderstand how copy works. Well, you're definitely ignoring the character-set-conversion issue. I was not trying to ignore the character set and

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote: The last really isn't an option, because the whole point of CSVs is to play with other programs, and my understanding is that those that understand multiline fields (e.g. Excel) expect them not to be escaped, and do not produce them escaped.

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote: The last really isn't an option, because the whole point of CSVs is to play with other programs, and my understanding is that those that understand multiline fields (e.g. Excel) expect them not to be escaped, and do not

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Patrick B Kelly wrote: Actually, when I try to export a sheet with multi-line cells from excel, it tells me that this feature is incompatible with the CSV format and will not include them in the CSV file. It probably depends on the version. I have

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Patrick B Kelly wrote: Actually, when I try to export a sheet with multi-line cells from excel, it tells me that this feature is incompatible with the CSV format and will not include them in the CSV file. It probably

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: I would vote in favor of removing the current code that attempts to support unquoted newlines, and waiting to see if there are complaints. Uhm. *raises hand* I agree with your argument but one way or another I have to load these CSVs I'm given. And like it

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread David Fetter
On Thu, Nov 11, 2004 at 03:38:16PM -0500, Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I would vote in favor of removing the current code that attempts to support unquoted newlines, and waiting to see if there are complaints. Uhm. *raises hand* I agree with your argument

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 2:56 PM, Andrew Dunstan wrote: Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Patrick B Kelly wrote: Actually, when I try to export a sheet with multi-line cells from excel, it tells me that this feature is incompatible with the CSV format and will not include them

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field? It would be a major change - the routine doesn't read data a field at a time, and has no idea if we are even

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly [EMAIL PROTECTED] writes: What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field? CopyReadLine has no business tracking that. One reason why not is that it is dealing with

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 6:16 PM, Tom Lane wrote: Patrick B Kelly [EMAIL PROTECTED] writes: What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field? CopyReadLine has no business tracking that. One

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: My suggestion is to simply have CopyReadLine recognize these two states (in-field and out-of-field) and execute the current logic only while in the second state. It would not be too hard but as you mentioned it is non-trivial. We don't know what state we expect the

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 10:07 PM, Andrew Dunstan wrote: Patrick B Kelly wrote: My suggestion is to simply have CopyReadLine recognize these two states (in-field and out-of-field) and execute the current logic only while in the second state. It would not be too hard but as you mentioned it is

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Bruce Momjian
Can I see an example of such a failure line? --- Andrew Dunstan wrote: Darcy Buskermolen has drawn my attention to unfortunate behaviour of COPY CSV with fields containing embedded line end chars if the embedded

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly [EMAIL PROTECTED] writes: I may not be explaining myself well or I may fundamentally misunderstand how copy works. Well, you're definitely ignoring the character-set-conversion issue. regards, tom lane ---(end of

[HACKERS] multiline CSV fields

2004-11-10 Thread Andrew Dunstan
Darcy Buskermolen has drawn my attention to unfortunate behaviour of COPY CSV with fields containing embedded line end chars if the embedded sequence isn't the same as those of the file containing the CSV data. In that case we error out when reading the data in. This means there are cases