Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: On Fri, Mar 19, 2004 at 09:54:37AM -0500, Tom Lane wrote: Why? The client-side code doesn't have any real say over the meaning of the data, at least not in psql-class clients. I suppose a client app that tries to interpret the data could get confused, but psql sure doesn't do that. libpq, fe-protocol2.c, pqGetCopyData2() and pqGetline2() cut string behind '\n'. Sure, but that doesn't invalidate the data stream as a whole, it's just a bufferload boundary choice that won't be very helpful for clients not using a newline-based data layout. In any case, v2 protocol is obsolete and needn't limit our thoughts about what to do in future. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
On Fri, Mar 19, 2004 at 09:54:37AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. Why? The client-side code doesn't have any real say over the meaning of the data, at least not in psql-class clients. I suppose a client app that tries to interpret the data could get confused, but psql sure doesn't do that. libpq, fe-protocol2.c, pqGetCopyData2() and pqGetline2() cut string behind '\n'. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] COPY formatting
On Thu, Mar 18, 2004 at 10:16:36AM -0500, Tom Lane wrote: Passing in a relation OID is probably a bad idea anyway, as it ties this API to the assumption that COPY is only for complete relations. There's been talk before of allowing a SELECT result to be presented via the COPY protocol, for instance. What might be a more usable API is COPY OUT: function formatter_out(text[]) returns text COPY IN: function formatter_in(text) returns text[] where the text array is either the results of or the input to the per-column datatype I/O routines. This makes it explicit that the formatter's job is solely to determine the column-level wrapping and unwrapping of the data. I'm assuming here that there is no good reason for the formatter to care about the specific datatypes involved; can you give a counterexample? The idea was put maximum information about tuple to formatter, and what will formatter do with this information is a formatter problem. It's pity that main idea of current COPY is based on separated lines and it is not more common interface for streaming data between FE and BE. Yeah, that was another concern I had. This API would let the formatter control line-level layout but it would not eliminate the hard-wired significance of newline. What's worse, there isn't any clean way to deal with reading quoted newlines --- the formatter can't really replace the default quoting rules if the low-level code is going to decide whether a newline is quoted or not. I think latest protocol version works with blocks of data and no with lines and client PQputCopyData() returns a block -- only docs says that it is row of table. We could possibly solve that by specifying that the text output or input (respectively) is the complete line sent to or from the client, including newline or whatever other line-level formatting you are using. This still leaves the problem of how the low-level COPY IN code knows what is a complete line to pass off to the formatter_in routine. We could possibly fix this by adding a second input-control routine function formatter_linelength(text) returns integer which is defined to return -1 if the input isn't a complete line yet But formatter_linelength() will need some context information I think. The others words some struct with formatter specific internal data. And for more difficult formats like XML you need some others context data (parser data) too. Maybe there can be some global exported struct (like for triggers) and functions that is written in C can use it. It means for simple formats like CSV you can use non-C functions and for formats like XML you can use C functions. And if it will intereting for PL developers they can add support for access to this structs to their languages. (i.e., read some more data, append to the buffer, and try again), or = 0 to indicate that the first N bytes of the buffer represent a complete line to be passed off to formatter_in. I don't see a way to combine formatter_in and formatter_linelength into a single function without relying on out parameters, which would again confine the feature to format functions written in C. It's a tad annoying that we need two functions for input. One way that we could still keep the COPY option syntax to be just FORMAT csv is to create an arbitrary difference in the signatures of the input functions. Then we could have coexisting functions csv(text[]) returns text csv(text) returns text[] csv(text, ...) returns int that are referenced by FORMAT csv. It sounds good, but I think we both not full sure about it now, right? CSV support will probably better add by DELIMITER extension. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
[EMAIL PROTECTED] (Andrew Dunstan) writes: Karel Zak wrote: Hi, in TODO is item: * Allow dump/load of CSV format. I don't think it's clean idea. Why CSV and why not something other? :-) A why not allow to users full control of the format by they own function. It means something like: COPY tablename [ ( column [, ...] ) ] TO { 'filename' | STDOUT } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] [ FORMAT funcname ] ] The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) -- it's pseudocode of course, it should be use standard fmgr interface. It's probably interesting for non-binary COPY version. Interesting ... The alternative might be an external program to munge CSVs and whatever other format people want to support and then call the exisiting COPY- either in bin or contrib. I have seen lots of people wanting to import CSVs, and that's even before we get a Windows port. I know Jan Wieck has been working on something like this, with a bit of further smarts... - By having, alongside, a table definition, the table can be created concurrently; - A set of mapping functions can be used, so that if, for instance, the program generating the data was Excel, and you have a field with values like 37985, 38045, or 38061, they can respectively be mapped to '2004-01-01', '2004-03-01', and '2004-03-17'; - It can load whatever data is loadable, and use Ethernet-like backoffs when it encounters bad records so that it loads all the data that is good, and leaves a bundle of 'crud' that is left over. He had been prototyping it in Tcl; I'm not sure how far a port to C has gotten. It looked pretty neat; it sure seems better to put the cleverness in userspace than to try to increase the complexity of the postmaster... -- output = (cbbrowne @ cbbrowne.com) http://cbbrowne.com/info/linuxxian.html Have you heard of the new Macsyma processor? It has three instructions -- LOAD, STORE, and SKIP IF INTEGRABLE. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] COPY formatting
Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: John, Smith, The White House 1600 Pennsylvania Avenue NW Washington, DC 20500, male, open source software office It is one record. (Long Live President Smith!) I have never seen such a beast, and if I had been asked off the cuff would have said that it was probably illegal, except that I know of no standard actually defining the format. Perhaps others (Joshua Drake or Josh Berkus?) have wider experience. I think in any case we should ignore those for now and handle the straightforward case. I *have* seen monstrosities like fields that do not begin with the quote character but then break into a quote, e.g.: 1,2,a,123abcdef,6,7,8 It's difficult to say it :-), but your DELIMITER idea is better than my suggested API. Andrew, go ahead. I thought about some data streaming, but COPY is probably bad place for it. Just to be clear, I have no current plans for doing this. I have about 3 pg related things I am working on, in addition to doing paid work so I can eat! If I do it it will not be for quite a while. There are some wrinkles, though, concerning the interaction of CSV's notion of escaping and COPY's notion of escaping. If someone want to undertake this I can flesh those out in a further email. Someone was asking the other day about a newbie or student project - this might be a good one (impact relatively isolated, very easy to test) although possibly not as sexy as some might like. cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: It's pity that main idea of current COPY is based on separated lines and it is not more common interface for streaming data between FE and BE. Yeah, that was another concern I had. This API would let the formatter control line-level layout but it would not eliminate the hard-wired significance of newline. What's worse, there isn't any clean way to deal with reading quoted newlines --- the formatter can't really replace the default quoting rules if the low-level code is going to decide whether a newline is quoted or not. I think latest protocol version works with blocks of data and no with lines and client PQputCopyData() returns a block -- only docs says that it is row of table. But you can't assume that the client will send blocks that are semantically significant. For instance, if psql is reading a file to send with \copy, how's it going to know how the file is formatted? It's just gonna send disk-block-sized messages, and the backend has to discover the semantic boundaries for itself. It sounds good, but I think we both not full sure about it now, right? CSV support will probably better add by DELIMITER extension. Yeah, without people beating on our door for such a hook, it seems like Andrew's DELIMITER idea is the best thing to do for now. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] COPY formatting
Andrew Dunstan wrote: Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: John, Smith, The White House 1600 Pennsylvania Avenue NW Washington, DC 20500, male, open source software office It is one record. (Long Live President Smith!) I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Regards, Fernando ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
On Fri, Mar 19, 2004 at 09:39:58AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: It's pity that main idea of current COPY is based on separated lines and it is not more common interface for streaming data between FE and BE. Yeah, that was another concern I had. This API would let the formatter control line-level layout but it would not eliminate the hard-wired significance of newline. What's worse, there isn't any clean way to deal with reading quoted newlines --- the formatter can't really replace the default quoting rules if the low-level code is going to decide whether a newline is quoted or not. I think latest protocol version works with blocks of data and no with lines and client PQputCopyData() returns a block -- only docs says that it is row of table. But you can't assume that the client will send blocks that are semantically significant. For instance, if psql is reading a file to send with \copy, how's it going to know how the file is formatted? And what \n in attibutes data in CSV? I think CSV format doesn't use some escape for newline char. It means psql with \copy cannot be sure with CSV. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. Why? The client-side code doesn't have any real say over the meaning of the data, at least not in psql-class clients. I suppose a client app that tries to interpret the data could get confused, but psql sure doesn't do that. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
Fernando Nasser wrote: Andrew Dunstan wrote: Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: John, Smith, The White House 1600 Pennsylvania Avenue NW Washington, DC 20500, male, open source software office It is one record. (Long Live President Smith!) I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Well, I just tried with OpenOffice on my RH9 box, and it translated the line breaks in the cell into spaces in the CSV. When I replaced them with line breaks in a text editor and reloaded it treated them as separate rows. I don't have a Windows box handy on which I can test Excel's behaviour. cheers andrew ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Andrew Dunstan [EMAIL PROTECTED] writes: There are some wrinkles, though, concerning the interaction of CSV's notion of escaping and COPY's notion of escaping. If someone want to undertake this I can flesh those out in a further email. Please do that, so that the info is in the archives in case someone else wants to tackle the project. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
I wrote: Fernando Nasser wrote: [snip re multiline CSV fields] I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Well, I just tried with OpenOffice on my RH9 box, and it translated the line breaks in the cell into spaces in the CSV. When I replaced them with line breaks in a text editor and reloaded it treated them as separate rows. I don't have a Windows box handy on which I can test Excel's behaviour. Clearly my experience is out of date - I just found a web ref to Excel doing just this. Oh, well, it doesn't seem to me an insurmountable problem. cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: And what \n in attibutes data in CSV? I think CSV format doesn't use some escape for newline char. It means psql with \copy cannot be sure with CSV. I think CSV only allows embedded newlines that are either escaped, or inside quotes. COPY doesn't currently have the notion of a quote character, but that was part of Andrew's proposal ... regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
quote who=Andrew Dunstan Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: John, Smith, The White House 1600 Pennsylvania Avenue NW Washington, DC 20500, male, open source software office It is one record. (Long Live President Smith!) I have never seen such a beast, and if I had been asked off the cuff would have said that it was probably illegal, except that I know of no standard actually defining the format. Perhaps others (Joshua Drake or Josh Berkus?) have wider experience. I think in any case we should ignore those for now and handle the straightforward case. I *have* seen monstrosities like fields that do not begin with the quote character but then break into a quote, e.g.: 1,2,a,123abcdef,6,7,8 I have dealt with both, frequently. The CSV format allows you to begin a block of text with the quote. Newlines are included in that quoted space. If qoutes are included in the field then the quotes are double quotes to denote they are not part of the quoted space. Also, the following is valid. 1,2,, is empty. 1,2,3,, The 4 quotes denote a single double quote. Writing simple CSV converts that just explode on commas and newlines miss these oddities. Try exporting an Access table with a Memo field (containg multiple lines) to CSV. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Bruce Momjian wrote: OK, so for the separator, quote, and escape options: separator is , quote is escape is so if the quote and escape are the same, then a double denotes a single? Yes. i.e. with the above settings abcdef - abcdef cheers andrew ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
Josh Berkus wrote: Overall, I assert again that approaching this issue through COPY enhancements is really not the way to go.We should be looking at a client utility, like pg_import and pg_export. The primary purpose of COPY is bulk loads for backup/restore, and I'm against doing a lot of tinkering which might make it less efficient or introduce new issues into what's currently very reliable. That's not unreasonable. I floated my idea as an alternative to a much more radical proposal. If we decided against it we should remove the TODO item. As against that, if we don't do this then I think we should embrace these utility programs more, possibly bringing them into the distribution. cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] COPY formatting
Andrew Dunstan wrote: Josh Berkus wrote: Overall, I assert again that approaching this issue through COPY enhancements is really not the way to go.We should be looking at a client utility, like pg_import and pg_export. The primary purpose of COPY is bulk loads for backup/restore, and I'm against doing a lot of tinkering which might make it less efficient or introduce new issues into what's currently very reliable. That's not unreasonable. I floated my idea as an alternative to a much more radical proposal. If we decided against it we should remove the TODO item. As against that, if we don't do this then I think we should embrace these utility programs more, possibly bringing them into the distribution. CSV seems to be the most widely requested conversion format. Anything else is probably a one-off job that should be done in perl or sed. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] COPY formatting
On Thu, Mar 18, 2004 at 07:48:40AM +0100, Karel Zak wrote: On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) This seems like it could only reasonably be implemented as a C function. Why? I said it's pseudo code. It should use standard fmgr API like every other PostgreSQL function or is it problem and I overlook something? It must to support arbitrary programming language and not C only. Well, I look over the COPY code and best will start with hardcoded version, but make it modular in code and if all will right we can think about some interface for others formats definition. OK? It's pity that main idea of current COPY is based on separated lines and it is not more common interface for streaming data between FE and BE. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
On Thu, Mar 18, 2004 at 09:29:03AM +, Lee Kindness wrote: To be honest this idea strikes me as overkill - over engineering. It was suggestion, maybe you're right :-) While i have done a lot of messing around reading/writing the binary format (and been stung by changes in that format) if you are using this format then you're 99% likely to be in control of the incoming/outgoing data and thus able to format to your wishes outwith COPY. I partly agree. But.. there is possible write directly final file by backend without data transfer to client. If we want to support this feature we need control output format by server... And.. I can image format that is use for BE/FE data transfer only and not for some final data presentation. For example compression of data stream from/to BE without PostgreSQL protocol change. Something else in the TODO regarding COPY is XML import/export, and for this to be supported in your proposed implementation the function would need to be passed in a heap more information. Yes, very probably some struct with all COPY information and format specific stuff. Tom was right that in this case it will C functions only. As I said I will try implement it without user defined function call for format conversion, but I will do it modular and in future we can create some interface for user defined formats. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Lee Kindness wrote: To be honest this idea strikes me as overkill - over engineering. While there is a clear need for proper CSV import (i.e. just setting DELIMITER to ',' doesn't work due to ','s in strings) I cannot see how this would prove useful, or who would use it? I agree. My modest proposal for handling CSVs would be to extend the DELIMITER parameter to allow up to 3 characters - separator, quote and escape. Escape would default to the quote char and the quote char would default to unspecified. This would involve no grammar changes and fairly isolated and small code changes, I believe. In the most common CSV cases you would just use $$,$$ or $$,'$$. :-) COPY is basically line/tuple oriented, and that alone would exclude many file formats (e.g. imagine wanting to import a spreadsheet where each worksheet was the table name and the first row on each worksheet was the field names - I have seen such beasts more than once). If we want a general facility for loading and exporting foreign file formats, I really believe that is the province of a utility program rather than a database engine function. The reason in my mind for making CSV a special case is that it is very easy to do and so often asked for. (I used to set parsing CSVs as a basic programming exercise - it is amazing how many way people find to get it wrong). cheers andrew ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
Andrew Dunstan wrote: Lee Kindness wrote: To be honest this idea strikes me as overkill - over engineering. While there is a clear need for proper CSV import (i.e. just setting DELIMITER to ',' doesn't work due to ','s in strings) I cannot see how this would prove useful, or who would use it? I agree. My modest proposal for handling CSVs would be to extend the DELIMITER parameter to allow up to 3 characters - separator, quote and escape. Escape would default to the quote char and the quote char would default to unspecified. This would involve no grammar changes and fairly isolated and small code changes, I believe. In the most common CSV cases you would just use $$,$$ or $$,'$$. :-) COPY is basically line/tuple oriented, and that alone would exclude many file formats (e.g. imagine wanting to import a spreadsheet where each worksheet was the table name and the first row on each worksheet was the field names - I have seen such beasts more than once). If we want a general facility for loading and exporting foreign file formats, I really believe that is the province of a utility program rather than a database engine function. The reason in my mind for making CSV a special case is that it is very easy to do and so often asked for. (I used to set parsing CSVs as a basic programming exercise - it is amazing how many way people find to get it wrong). I like the separator, quote, and escape idea. It allows variety without requiring folks to code in C. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: This seems like it could only reasonably be implemented as a C function. Why? I said it's pseudo code. It should use standard fmgr API like every other PostgreSQL function or is it problem and I overlook something? It must to support arbitrary programming language and not C only. Sure, but the question is whether the *stuff it has to do* can reasonably be coded in anything but C. Why are you passing in a relation OID, if not for lookups in relcache entries that are simply not accessible above the C level? (Don't tell me you want the function to do a bunch of actual SELECTs from system catalogs for every line of the copy...) Passing in a relation OID is probably a bad idea anyway, as it ties this API to the assumption that COPY is only for complete relations. There's been talk before of allowing a SELECT result to be presented via the COPY protocol, for instance. What might be a more usable API is COPY OUT: function formatter_out(text[]) returns text COPY IN: function formatter_in(text) returns text[] where the text array is either the results of or the input to the per-column datatype I/O routines. This makes it explicit that the formatter's job is solely to determine the column-level wrapping and unwrapping of the data. I'm assuming here that there is no good reason for the formatter to care about the specific datatypes involved; can you give a counterexample? It's pity that main idea of current COPY is based on separated lines and it is not more common interface for streaming data between FE and BE. Yeah, that was another concern I had. This API would let the formatter control line-level layout but it would not eliminate the hard-wired significance of newline. What's worse, there isn't any clean way to deal with reading quoted newlines --- the formatter can't really replace the default quoting rules if the low-level code is going to decide whether a newline is quoted or not. We could possibly solve that by specifying that the text output or input (respectively) is the complete line sent to or from the client, including newline or whatever other line-level formatting you are using. This still leaves the problem of how the low-level COPY IN code knows what is a complete line to pass off to the formatter_in routine. We could possibly fix this by adding a second input-control routine function formatter_linelength(text) returns integer which is defined to return -1 if the input isn't a complete line yet (i.e., read some more data, append to the buffer, and try again), or = 0 to indicate that the first N bytes of the buffer represent a complete line to be passed off to formatter_in. I don't see a way to combine formatter_in and formatter_linelength into a single function without relying on out parameters, which would again confine the feature to format functions written in C. It's a tad annoying that we need two functions for input. One way that we could still keep the COPY option syntax to be just FORMAT csv is to create an arbitrary difference in the signatures of the input functions. Then we could have coexisting functions csv(text[]) returns text csv(text) returns text[] csv(text, ...) returns int that are referenced by FORMAT csv. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COPY formatting
Andrew Dunstan [EMAIL PROTECTED] writes: Lee Kindness wrote: To be honest this idea strikes me as overkill - over engineering. I agree. My modest proposal for handling CSVs would be to extend the DELIMITER parameter to allow up to 3 characters - separator, quote and escape. Escape would default to the quote char and the quote char would default to unspecified. I could go with that too. The question here is do we have any popular use-cases that aren't solved by that extension, but could be solved by simple user-level data formatting functions? I'm not real eager to add such a feature as an if we build it they will come speculation, but if anyone can point to solid use-cases besides handling CSV, then it probably is worth doing. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Lee Kindness wrote: To be honest this idea strikes me as overkill - over engineering. I agree. My modest proposal for handling CSVs would be to extend the DELIMITER parameter to allow up to 3 characters - separator, quote and escape. Escape would default to the quote char and the quote char would default to unspecified. I could go with that too. The question here is do we have any popular use-cases that aren't solved by that extension, but could be solved by simple user-level data formatting functions? I'm not real eager to add such a feature as an if we build it they will come speculation, but if anyone can point to solid use-cases besides handling CSV, then it probably is worth doing. The thing I liked about Andrew's idea is that it even covers escape quoting for CVS, which might change from implementation to implentation, and it is flexible without requiring C coding. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
Karel, Andrew, Fernando: On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) No offense, but isn't this whole thing more appropriate for a client program? Like the pg_import and pg_export projects on GBorg? Has anyone looked at those projects? I can see making a special provision for CSV in COPY, just because it's such a universal format. But I personally don't see that a complex, sophisticated import/export formatter belongs on the SQL command line. Particularly since most users will want a GUI to handle it. And, BTW, I deal with CSV *all the time* for my insurance clients, and I can tell you that that format hasn't changed in 20 years. We can hard-code it if it's easier. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
And, BTW, I deal with CSV *all the time* for my insurance clients, and I can tell you that that format hasn't changed in 20 years. We can hard-code it if it's easier. Well many of my clients consider CSV Character Separated Value not Comma... Thus I get data like this: Hello,Good Bye Hello Good Bye Hello,Good Bye This, They're ThisThey're ThisIs A 1 Dealing with all of these different nuances is may or may not be beyond the scope of copy but it seems that it could be something that it can handle. Python has a csv module that allows you to assign dialects to any specific type of import you are performing. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0034 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
Tom Lane [EMAIL PROTECTED] writes: I could go with that too. The question here is do we have any popular use-cases that aren't solved by that extension, but could be solved by simple user-level data formatting functions? I'm not real eager to add such a feature as an if we build it they will come speculation, but if anyone can point to solid use-cases besides handling CSV, then it probably is worth doing. (I can't believe I'm saying this, but) It seems like xml output would be the use case you're looking for. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] COPY formatting
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I could go with that too. The question here is do we have any popular use-cases that aren't solved by that extension, but could be solved by simple user-level data formatting functions? (I can't believe I'm saying this, but) It seems like xml output would be the use case you're looking for. Does that fall into the category of stuff that could be supported by the kind of API we're talking about? I should think that XML would need a much more global view of the data, not just line-by-line reformatting. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] COPY formatting
Karel Zak wrote: Hi, in TODO is item: * Allow dump/load of CSV format. I don't think it's clean idea. Why CSV and why not something other? :-) A why not allow to users full control of the format by they own function. It means something like: COPY tablename [ ( column [, ...] ) ] TO { 'filename' | STDOUT } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] [ FORMAT funcname ] ] The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) -- it's pseudocode of course, it should be use standard fmgr interface. It's probably interesting for non-binary COPY version. Comments? Karel Karel, This seems to be an excellent idea. People have already asked for many different formats. Usually I recommend them to use psql -c COPY ... dbname | awk Since Windows will be supported soon, it will be hard to pipe data to a useful program (awk, sed, ...). Maybe this feature would help a lot in this case. Regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
Hans-Jürgen Schönig wrote: Karel Zak wrote: Hi, in TODO is item: * Allow dump/load of CSV format. I don't think it's clean idea. Why CSV and why not something other? :-) A why not allow to users full control of the format by they own function. It means something like: COPY tablename [ ( column [, ...] ) ] TO { 'filename' | STDOUT } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] [ FORMAT funcname ] ] The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) -- it's pseudocode of course, it should be use standard fmgr interface. It's probably interesting for non-binary COPY version. Comments? Karel Karel, This seems to be an excellent idea. People have already asked for many different formats. Usually I recommend them to use psql -c COPY ... dbname | awk Since Windows will be supported soon, it will be hard to pipe data to a useful program (awk, sed, ...). Maybe this feature would help a lot in this case. Could a CSV-generating function be provided with the distribution then? Regards, Fernando ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
Karel Zak [EMAIL PROTECTED] writes: The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) This seems like it could only reasonably be implemented as a C function. I can't really imagine the average user of COPY wanting to write C in preference to, say, an external perl script. What's the real use-case for the feature? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] COPY formatting
Karel Zak wrote: Hi, in TODO is item: * Allow dump/load of CSV format. I don't think it's clean idea. Why CSV and why not something other? :-) A why not allow to users full control of the format by they own function. It means something like: COPY tablename [ ( column [, ...] ) ] TO { 'filename' | STDOUT } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] [ FORMAT funcname ] ] The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) -- it's pseudocode of course, it should be use standard fmgr interface. It's probably interesting for non-binary COPY version. Interesting ... The alternative might be an external program to munge CSVs and whatever other format people want to support and then call the exisiting COPY- either in bin or contrib. I have seen lots of people wanting to import CSVs, and that's even before we get a Windows port. cheers andrew ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) This seems like it could only reasonably be implemented as a C function. I can't really imagine the average user of COPY wanting to write C in preference to, say, an external perl script. What's the real use-case for the feature? That is why I suggested providing a pre-written/pre-compiled/installed function for CSV (call it CSV?). Advanced users could still write their own as people can write many other things if they know their ways. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] COPY formatting
That is why I suggested providing a pre-written/pre-compiled/installed function for CSV (call it CSV?). Advanced users could still write their own as people can write many other things if they know their ways. As someone who just went through a whole truckload of crap getting delimited files parsed from MSSQL to PostgreSQL. I believe yes this would be great thing. We ended up using plPython with the CSV module. Sincerely, Joshua Drake ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] COPY formatting
I deal with this daily in a cygwin environment. I wrote a simple c++ program where I hardcoded the input file name/location and output file name/location. I strip the quotation marks out where they are used for identifying text fields and change the comma's used as CSV's to pipes. I use a combination of bash scripting to execute the c++ program and then Perl to execute a stored procedure. I am new to Perl so I have not yet considered migrating it all into Perl. The dos2unix tools in cygwin always messed up the first character of the first line. I thought the real issue with the copy function and CSVs was that it did not like the use of quotations around the fields to identify text fields. For a true Windows port handling MS Excel files in their native format would be a goal I would hope. If your api could handle that then I would agree with your method. On Wed, 2004-03-17 at 22:14, Joshua D. Drake wrote: That is why I suggested providing a pre-written/pre-compiled/installed function for CSV (call it CSV?). Advanced users could still write their own as people can write many other things if they know their ways. As someone who just went through a whole truckload of crap getting delimited files parsed from MSSQL to PostgreSQL. I believe yes this would be great thing. We ended up using plPython with the CSV module. Sincerely, Joshua Drake ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings signature.asc Description: This is a digitally signed message part
Re: [HACKERS] COPY formatting
On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: The formatting function API can be pretty simple: text *my_copy_format(text *attrdata, int direction, int nattrs, int attr, oid attrtype, oid relation) This seems like it could only reasonably be implemented as a C function. Why? I said it's pseudo code. It should use standard fmgr API like every other PostgreSQL function or is it problem and I overlook something? It must to support arbitrary programming language and not C only. I can't really imagine the average user of COPY wanting to write C in preference to, say, an external perl script. What's the real use-case for the feature? Don't hardcode any format to PostgreSQL, be open for others formats. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly