Re: [HACKERS] COPY formatting
On Thu, Mar 18, 2004 at 10:16:36AM -0500, Tom Lane wrote: > Passing in a relation OID is probably a bad idea anyway, as it ties this > API to the assumption that COPY is only for complete relations. There's > been talk before of allowing a SELECT result to be presented via the > COPY protocol, for instance. What might be a more usable API is > > COPY OUT: > function formatter_out(text[]) returns text > COPY IN: > function formatter_in(text) returns text[] > > where the text array is either the results of or the input to the > per-column datatype I/O routines. This makes it explicit that the > formatter's job is solely to determine the column-level wrapping and > unwrapping of the data. I'm assuming here that there is no good reason > for the formatter to care about the specific datatypes involved; can you > give a counterexample? The idea was put maximum information about tuple to formatter, and what will formatter do with this information is a formatter problem. > > It's pity that main idea of current COPY is based on separated lines > > and it is not more common interface for streaming data between FE and BE. > > Yeah, that was another concern I had. This API would let the formatter > control line-level layout but it would not eliminate the hard-wired > significance of newline. What's worse, there isn't any clean way to > deal with reading quoted newlines --- the formatter can't really replace > the default quoting rules if the low-level code is going to decide > whether a newline is quoted or not. I think latest protocol version works with blocks of data and no with lines and client PQputCopyData() returns a block -- only docs says that it is row of table. > We could possibly solve that by specifying that the text output or input > (respectively) is the complete line sent to or from the client, > including newline or whatever other line-level formatting you are using. > This still leaves the problem of how the low-level COPY IN code knows > what is a complete line to pass off to the formatter_in routine. We > could possibly fix this by adding a second input-control routine > > function formatter_linelength(text) returns integer > > which is defined to return -1 if the input isn't a complete line yet But formatter_linelength() will need some context information I think. The others words some struct with formatter specific internal data. And for more difficult formats like XML you need some others context data (parser data) too. Maybe there can be some global exported struct (like for triggers) and functions that is written in C can use it. It means for simple formats like CSV you can use non-C functions and for formats like XML you can use C functions. And if it will intereting for PL developers they can add support for access to this structs to their languages. > (i.e., read some more data, append to the buffer, and try again), or > >= 0 to indicate that the first N bytes of the buffer represent a > complete line to be passed off to formatter_in. I don't see a way to > combine formatter_in and formatter_linelength into a single function > without relying on "out" parameters, which would again confine the > feature to format functions written in C. > It's a tad annoying that we need two functions for input. One way that > we could still keep the COPY option syntax to be just > FORMAT csv > is to create an arbitrary difference in the signatures of the input > functions. Then we could have coexisting functions > csv(text[]) returns text > csv(text) returns text[] > csv(text, ...) returns int > that are referenced by "FORMAT csv". It sounds good, but I think we both not full sure about it now, right? CSV support will probably better add by DELIMITER extension. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] pg_advisor schema proof of concept
Dear hackers, please find attached a quick proof of concept for a 'pg_advisor' schema. Well, the name is 'xpg_advisor' at the time, because it is not a system schema hence it cannot starts with 'pg_'. It appears that some support functions would be useful. I've noticed some are available from pg_catalog, but I have not found yet what I was looking for. If you do not like some advices, just "DROP VIEW the_advice;" If you do not like advices at all, just "DROP SCHEMA xpg_advisor;" Have a nice day, -- Fabien Coelho - [EMAIL PROTECTED]-- -- $Id: pg_advisor.sql,v 1.3 2004/03/19 10:41:47 coelho Exp $ -- -- pg_advisor maybe future system schema? -- -- at the time it is xpg_advisor as pg_ is reserved. -- DROP SCHEMA xpg_advisor CASCADE; CREATE SCHEMA xpg_advisor; COMMENT ON SCHEMA xpg_advisor IS 'various advices about database design or performance' ; SET search_path TO xpg_advisor,pg_catalog; -- -- DESIGN ADVICES (da_*) -- -- -- tables without primary keys -- CREATE VIEW da_tables_without_primary_key AS SELECT n.nspname AS Schema, c.relname AS Name FROM pg_class AS c JOIN pg_namespace AS n ON (c.relnamespace=n.oid) WHERE -- no comments about system catalogs. n.nspname NOT IN ('pg_catalog','pg_toast','information_schema','xpg_advisor') -- only tables AND c.relkind = 'r' -- no primary key AND NOT c.relhaspkey ORDER BY Schema ASC, Name ASC; COMMENT ON VIEW da_tables_without_primary_key IS 'it is better to have a primary key on your tables'; -- SELECT * FROM da_tables_without_primary_key; -- -- tables with composite primary keys? -- CREATE VIEW da_tables_with_composite_primary_key AS SELECT n.nspname AS Schema, c.relname AS Name FROM pg_class AS c JOIN pg_namespace AS n ON (c.relnamespace=n.oid) JOIN pg_constraint AS o ON (o.conrelid=c.oid) WHERE -- no comments about system catalogs. n.nspname NOT IN ('pg_catalog','pg_toast','information_schema','xpg_advisor') -- only tables AND c.relkind = 'r' -- with a primary key AND c.relhaspkey -- the primary key constraint AND contype = 'p' -- more than 1 element in conkey AND array_upper(o.conkey, 1)>1 ORDER BY Schema ASC, Name ASC; COMMENT ON VIEW da_tables_with_composite_primary_key IS 'it may be considered a better design to have simple primary keys'; -- SELECT * FROM da_tables_with_composite_primary_key; -- -- other ideas for later: -- -- foreign keys which do not exactly match their target key types? -- foreign keys which do match their target types but with different sizes? -- tables with large primary keys (such as TEXT, VARCHAR(64)...)? -- -- count summary of design advices? -- how to SELECT COUNT(*) FROM [the view of which I have the oid]? -- Do I need a function that count tuples in a table given its oid? -- CREATE FUNCTION pg_count_tuples(INTEGER oid) RETURNS INTEGER... CREATE VIEW design_advices_summary AS SELECT c.relname AS Name, d.description AS Description -- , pg_count_tuples(c.oid) AS Count FROM pg_class AS c JOIN pg_description AS d ON (d.objoid=c.oid) JOIN pg_namespace AS n ON (c.relnamespace=n.oid) WHERE -- xpg_advisor views n.nspname = 'xpg_advisor' -- only design advices AND c.relname LIKE 'da_%'; -- SELECT * FROM design_advices_summary; -- -- PERFORMANCE ADVICES (pa_*) -- -- no usable primary key index for foreign key referencial integrity checks? -- no usable index for foreigh key on deletes? -- -- count summary of performance advices? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] UnixWare/CVS Tip/initdb.c needs to use threads flags...
Larry Rosenman wrote: > > I thought that once you include libpthread in libpq, that you don't have > > to mention it again then you use libpq. Is your platform different > > somehow in this regard? > > > > I seem to remember this problem with libcrypt and libpq. Is this the > > same problem? > > > > I see that initdb is just the first of many /bin programs to be > > compiled, so if we have to add the thread lib, we will have to do it for > > all the bin programs. Yikes. Why wasn't this a problem for 7.4? > 7.4 had initdb as a Shell Script. > the 7.4.x libpq didn't have any pthread_* references in it, that I see > on my box. Ah, yes. We added the thread-local storage to handle SIGPIPE. The problem is that initdb isn't the only place. If you comment out initdb from the Makefile in src/bin, does the next make fail too? I bet it does. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-www] [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org
--- David Garamond <[EMAIL PROTECTED]> wrote: --snip -- > > Also, we're targetting the developers right? Please do not consider > ourselves as being too stupid to differentiate between postgresql.org > -- snip -- IMO this point of view is a short-sighted and narrow one. In addition to trying to bring a more structured and developed infrastructure to 3rd party developers this should also ease the difficulty many non-developers have in finding related software to the PostgreSQL project. This fact as well as the others should be taken into consideration when making the decision on which route to go. Chris Ryan __ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
[EMAIL PROTECTED] (Andrew Dunstan) writes: > Karel Zak wrote: > >> Hi, >> >> in TODO is item: "* Allow dump/load of CSV format". I don't think >> it's clean idea. Why CSV and why not something other? :-) >> >> A why not allow to users full control of the format by they own >> function. It means something like: >> COPY tablename [ ( column [, ...] ) ] >> TO { 'filename' | STDOUT } >> [ [ WITH ] [ BINARY ] >> [ OIDS ] >> [ DELIMITER [ AS ] 'delimiter' ] >> [ NULL [ AS ] 'null string' ] >> [ FORMAT funcname ] ] >> >> The formatting >> function API can be pretty simple: >> >> text *my_copy_format(text *attrdata, int direction, int >> nattrs, int attr, oid attrtype, oid relation) >> >> -- it's pseudocode of course, it should be use standard fmgr >> interface. >> It's probably interesting for non-binary COPY version. > > Interesting ... The alternative might be an external program to munge > CSVs and whatever other format people want to support and then call > the exisiting COPY- either in bin or contrib. I have seen lots of > people wanting to import CSVs, and that's even before we get a Windows > port. I know Jan Wieck has been working on something like this, with a bit of further smarts... - By having, alongside, a table definition, the table can be created concurrently; - A set of mapping functions can be used, so that if, for instance, the program generating the data was Excel, and you have a field with values like 37985, 38045, or 38061, they can respectively be mapped to '2004-01-01', '2004-03-01', and '2004-03-17'; - It can load whatever data is loadable, and use Ethernet-like backoffs when it encounters bad records so that it loads all the data that is good, and leaves a bundle of 'crud' that is left over. He had been prototyping it in Tcl; I'm not sure how far a port to C has gotten. It looked pretty neat; it sure seems better to put the "cleverness" in userspace than to try to increase the complexity of the postmaster... -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/linuxxian.html Have you heard of the new Macsyma processor? It has three instructions -- LOAD, STORE, and SKIP IF INTEGRABLE. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] COPY formatting
Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: "John", "Smith", "The White House 1600 Pennsylvania Avenue NW Washington, DC 20500", "male", "open source software office" It is one record. (Long Live President Smith!) I have never seen such a beast, and if I had been asked off the cuff would have said that it was probably illegal, except that I know of no standard actually defining the format. Perhaps others (Joshua Drake or Josh Berkus?) have wider experience. I think in any case we should ignore those for now and handle the straightforward case. I *have* seen monstrosities like fields that do not begin with the quote character but then break into a quote, e.g.: 1,2,a,123"abc""def",6,7,8 It's difficult to say it :-), but your DELIMITER idea is better than my suggested API. Andrew, go ahead. I thought about some data streaming, but COPY is probably bad place for it. Just to be clear, I have no current plans for doing this. I have about 3 pg related things I am working on, in addition to doing paid work so I can eat! If I do it it will not be for quite a while. There are some wrinkles, though, concerning the interaction of CSV's notion of escaping and COPY's notion of escaping. If someone want to undertake this I can flesh those out in a further email. Someone was asking the other day about a newbie or student project - this might be a good one (impact relatively isolated, very easy to test) although possibly not as sexy as some might like. cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Inherited tables
On a recent thread Stephan Szabo mentioned some issues with inheritance. *** > On Thu, 19 Feb 2004, Stephan Szabo wrote: > > > There are two separate things here that are gotchas > > The first is that unique constraints don't inherit, and foreign keys must > > refer to a unique constraint. > > > These are both deficiencies in inheritance and the constraints in > > question. *** Anyone could comment on those problems? I am working on a design which will heavily use inherittance and I think on the long run the limitations above could make things to be more work for me. I would be willing to put some money forward to have someone work on improving those limitations. Anyone knows who may be a good candidate to work on this? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] COPY formatting
Karel Zak <[EMAIL PROTECTED]> writes: >>> It's pity that main idea of current COPY is based on separated lines >>> and it is not more common interface for streaming data between FE and BE. >> >> Yeah, that was another concern I had. This API would let the formatter >> control line-level layout but it would not eliminate the hard-wired >> significance of newline. What's worse, there isn't any clean way to >> deal with reading quoted newlines --- the formatter can't really replace >> the default quoting rules if the low-level code is going to decide >> whether a newline is quoted or not. > I think latest protocol version works with blocks of data and no with > lines and client PQputCopyData() returns a block -- only docs says that > it is row of table. But you can't assume that the client will send blocks that are semantically significant. For instance, if psql is reading a file to send with \copy, how's it going to know how the file is formatted? It's just gonna send disk-block-sized messages, and the backend has to discover the semantic boundaries for itself. > It sounds good, but I think we both not full sure about it now, right? > CSV support will probably better add by DELIMITER extension. Yeah, without people beating on our door for such a hook, it seems like Andrew's DELIMITER idea is the best thing to do for now. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] COPY formatting
Andrew Dunstan wrote: Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: "John", "Smith", "The White House 1600 Pennsylvania Avenue NW Washington, DC 20500", "male", "open source software office" It is one record. (Long Live President Smith!) I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Regards, Fernando ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
On Fri, Mar 19, 2004 at 09:39:58AM -0500, Tom Lane wrote: > Karel Zak <[EMAIL PROTECTED]> writes: > >>> It's pity that main idea of current COPY is based on separated lines > >>> and it is not more common interface for streaming data between FE and BE. > >> > >> Yeah, that was another concern I had. This API would let the formatter > >> control line-level layout but it would not eliminate the hard-wired > >> significance of newline. What's worse, there isn't any clean way to > >> deal with reading quoted newlines --- the formatter can't really replace > >> the default quoting rules if the low-level code is going to decide > >> whether a newline is quoted or not. > > > I think latest protocol version works with blocks of data and no with > > lines and client PQputCopyData() returns a block -- only docs says that > > it is row of table. > > But you can't assume that the client will send blocks that are > semantically significant. For instance, if psql is reading a file to > send with \copy, how's it going to know how the file is formatted? And what \n in attibutes data in CSV? I think CSV format doesn't use some escape for newline char. It means psql with \copy cannot be sure with CSV. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] COPY formatting
Karel Zak <[EMAIL PROTECTED]> writes: > The problem with CSV is that it will correctly work with new protocol > only. Because old versions of clients are newline sensitive. Why? The client-side code doesn't have any real say over the meaning of the data, at least not in psql-class clients. I suppose a client app that tries to interpret the data could get confused, but psql sure doesn't do that. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Question on restoring and compiled plans
Richard Huxton <[EMAIL PROTECTED]> writes: > If I pg_restore a DB and don't ANALYSE before a pl/pgsql trigger activates > then that trigger's going to be planned based on the default stats forever > isn't it? Only for the life of the current backend(s). If that's still too long for you, you could do a CREATE OR REPLACE FUNCTION that doesn't actually change anything about the trigger function; that will force current cached plans to be thrown away. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Fernando Nasser wrote: Andrew Dunstan wrote: Karel Zak wrote: The problem with CSV is that it will correctly work with new protocol only. Because old versions of clients are newline sensitive. And CSV can contains newline in by quotation marks defined attributes: "John", "Smith", "The White House 1600 Pennsylvania Avenue NW Washington, DC 20500", "male", "open source software office" It is one record. (Long Live President Smith!) I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Well, I just tried with OpenOffice on my RH9 box, and it translated the line breaks in the cell into spaces in the CSV. When I replaced them with line breaks in a text editor and reloaded it treated them as separate rows. I don't have a Windows box handy on which I can test Excel's behaviour. cheers andrew ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_advisor schema proof of concept
> please find attached a quick proof of concept for a 'pg_advisor' schema. Here is a "larger" but nevertheless still quick proof of concept, alas with some buggy PL/pgSQL that I wrote with my little finger. It implements some foreign key type checks for which I submitted be patches some time ago. The more I think about it, the more I find it should be the way to go, rather than having a new external tool. -- Fabien Coelho - [EMAIL PROTECTED]-- -- $Id: pg_advisor.sql,v 1.13 2004/03/19 14:55:39 coelho Exp $ -- -- pg_advisor maybe future system schema? -- -- at the time it is xpg_advisor as pg_ is reserved. -- DROP SCHEMA xpg_advisor CASCADE; CREATE SCHEMA xpg_advisor; COMMENT ON SCHEMA xpg_advisor IS 'various advices about database design or performance' ; SET search_path TO xpg_advisor,pg_catalog; -- -- DESIGN ADVICES (da_*) -- -- -- tables without primary keys -- CREATE VIEW da_tables_without_primary_key AS SELECT n.nspname AS Schema, c.relname AS Name FROM pg_class AS c JOIN pg_namespace AS n ON (c.relnamespace=n.oid) WHERE -- no comments about system catalogs. n.nspname NOT IN ('pg_catalog','pg_toast','information_schema','xpg_advisor') -- only tables AND c.relkind = 'r' -- no primary key AND NOT c.relhaspkey ORDER BY Schema ASC, Name ASC; COMMENT ON VIEW da_tables_without_primary_key IS 'it is better to have a primary key on your tables'; -- SELECT * FROM da_tables_without_primary_key; -- -- tables with composite primary keys? -- CREATE VIEW da_tables_with_composite_primary_key AS SELECT n.nspname AS Schema, c.relname AS Name FROM pg_class AS c JOIN pg_namespace AS n ON (c.relnamespace=n.oid) JOIN pg_constraint AS o ON (o.conrelid=c.oid) WHERE -- no comments about system catalogs. n.nspname NOT IN ('pg_catalog','pg_toast','information_schema','xpg_advisor') -- only tables AND c.relkind = 'r' -- with a primary key AND c.relhaspkey -- the primary key constraint AND contype = 'p' -- more than 1 element in conkey AND array_upper(o.conkey, 1)>1 ORDER BY Schema ASC, Name ASC; COMMENT ON VIEW da_tables_with_composite_primary_key IS 'it may be considered a better design to have simple primary keys'; -- SELECT * FROM da_tables_with_composite_primary_key; -- -- missing array_index function -- CREATE OR REPLACE FUNCTION array_index(anyarray, anyelement) RETURNS INTEGER AS ' DECLARE tab ALIAS FOR $1; item ALIAS FOR $2; i INTEGER; BEGIN i := 1; LOOP IF i > array_upper(tab, 1) THEN RETURN NULL; END IF; IF tab[i] = item THEN RETURN i; END IF; i := i+1; END LOOP; END;' LANGUAGE plpgsql; COMMENT ON FUNCTION array_index(anyarray, anyelement) IS 'missing array_index function... should be already there!'; -- -- internal foreign key constraint... -- CREATE VIEW in_foreign_key_contraint AS SELECT -- constraint c.oid AS constraint, -- referencing table c.conrelid AS crel, cca.attname AS ccol, cca.atttypid AS ctyp, cca.atttypmod AS cmod, cca.attlen AS clen, -- referenced table, foreign part c.confrelid AS frel, fka.attname AS fcol, fka.atttypid AS ftyp, fka.atttypmod AS fmod, fka.attlen AS flen, array_index(c.confkey, fka.attnum) AS component FROM pg_constraint AS c JOIN pg_attribute AS cca ON (c.conrelid=cca.attrelid) JOIN pg_attribute AS fka ON (c.confrelid=fka.attrelid) WHERE -- foreign key constraint c.contype='f' -- column attribute in constraint AND cca.attnum = ANY (c.conkey) -- foreign key attribute AND fka.attnum = ANY (c.confkey) -- matching constraints AND array_index(c.confkey, fka.attnum)=array_index(c.conkey, cca.attnum) ; -- -- foreign keys which do not exactly match their target key types? -- CREATE VIEW da_foreign_key_type_dont_match AS SELECT nc.nspname AS Schema, cc.relname AS Name, fkc.ccol AS AttName, format_type(fkc.ctyp, fkc.cmod) AS ColumnType, nf.nspname AS FSchema, cf.relname AS FName, fkc.fcol AS FAttName, format_type(fkc.ftyp, fkc.fmod) AS ForeignType, fkc.component AS Component FROM in_foreign_key_contraint AS fkc JOIN pg_class AS cc ON (fkc.crel=cc.oid) JOIN pg_namespace AS nc ON (cc.relnamespace=nc.oid) --JOIN pg_attribute AS ac ON (fkc.ccol=ac.oid) JOIN pg_class AS cf ON (fkc.frel=cf.oid) JOIN pg_namespace AS nf ON (cf.relnamespace=nf.oid) --JOIN pg_attribute AS af ON (fkc.fcol=af.oid) WHERE -- no comments about system catalogs. nc.nspname NOT IN('pg_catalog','pg_toast','information_schema','xpg_advisor') AND nf.nspname NOT IN('pg_catalog','pg_toast','information_schema','xpg_advisor') -- only tables (redundant?) AND cc.relkind='r' AND cf.relkind='r' -- non matching type AND fkc.ctyp!=fkc.ftyp ORDER BY Schema ASC, Name ASC, FSchema ASC, FName ASC, Component ASC; COMMENT ON VIEW da_foreign_key_type_dont_match IS 'non matching foreing key component, maybe a bad design'; -- SELECT * FROM da_foreign_key_type_dont_match; -- -- others? -- -- foreign keys which do match thei
Re: [HACKERS] Question on restoring and compiled plans
On Friday 19 March 2004 15:04, Tom Lane wrote: > Richard Huxton <[EMAIL PROTECTED]> writes: > > If I pg_restore a DB and don't ANALYSE before a pl/pgsql trigger > > activates then that trigger's going to be planned based on the default > > stats forever isn't it? > > Only for the life of the current backend(s). Ah that's alright then. ANALYZE is usually the first thing I do after a restore anyway, but thought I'd check how important it was that I remember. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] COPY formatting
Andrew Dunstan <[EMAIL PROTECTED]> writes: > There are some wrinkles, though, concerning the interaction of CSV's > notion of escaping and COPY's notion of escaping. If someone want to > undertake this I can flesh those out in a further email. Please do that, so that the info is in the archives in case someone else wants to tackle the project. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
I wrote: Fernando Nasser wrote: [snip re multiline CSV fields] I have never seen such a beast, Export from a spreadsheet where people have formated the cell with the address on it. Well, I just tried with OpenOffice on my RH9 box, and it translated the line breaks in the cell into spaces in the CSV. When I replaced them with line breaks in a text editor and reloaded it treated them as separate rows. I don't have a Windows box handy on which I can test Excel's behaviour. Clearly my experience is out of date - I just found a web ref to Excel doing just this. Oh, well, it doesn't seem to me an insurmountable problem. cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] looking for an 'array_index' function?
Dear hackers, I'm looking for an array_index function that would return the index of the occurence of some element in an array. I noticed that I can know whether a element is in the array with ANY, but I also need the actual index. There is no such function under that name, and I cannot find any that would do the job in the doc. I made a quick workaround in plpgsql, but I would have expected the function to be there. Am I unlucky or just blind? I know that looking for such a function is a proof a bad design, but I don't feel responsible for the pg_catalog.* design;-) Thanks in advance for any pointer, Have a nice day, -- Fabien Coelho - [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY formatting
Karel Zak <[EMAIL PROTECTED]> writes: > And what \n in attibutes data in CSV? I think CSV format doesn't use > some escape for newline char. It means psql with \copy cannot be sure > with CSV. I think CSV only allows embedded newlines that are either escaped, or inside quotes. COPY doesn't currently have the notion of a quote character, but that was part of Andrew's proposal ... regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Question on restoring and compiled plans
If I pg_restore a DB and don't ANALYSE before a pl/pgsql trigger activates then that trigger's going to be planned based on the default stats forever isn't it? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] 7.4.2 Build broken on (Sparc) Solaris 7 and 8
> Environment: > > SunOS 5.8 Generic_108528-27 sun4u sparc SUNW,Ultra-250 > gcc version 3.3.1 > PostgreSQL-7.4.2 > ./configure --with-java --enable-thread-safety > > "make" results in: > > gcc -O2 -fno-strict-aliasing -Wall -Wmissing-prototypes > -Wmissing-declarations -fPIC -I. -I../../../src/include > -DFRONTEND -DSYSCONFDIR='"/usr/local/pgsql/etc"' -c -o thread.o > thread.c > thread.c: In function `pqGetpwuid': > thread.c:116: error: too many arguments to function `*getpwuid_r*' > > > Environment: > > SunOS 5.7 Generic_106541-29 sun4u sparc SUNW,UltraSPARC-IIi-Engine > gcc version 3.3.1 > PostgreSQL-7.4.2 > ./configure --with-java --enable-thread-safety > > "make' results in: > > gcc -O2 -fno-strict-aliasing -Wall -Wmissing-prototypes > -Wmissing-declarations -fPIC -I. -I../../../src/include > -DFRONTEND -DSYSCONFDIR='"/usr/local/pgsql/etc"' -c -o thread.o > thread.c > thread.c: In function `pqGetpwuid': > thread.c:116: error: too many arguments to function `*getpwuid_r*' > thread.c: In function `pqGethostbyname': > thread.c:189: error: `resbuf' undeclared (first use in this function) > thread.c:189: error: (Each undeclared identifier is reported only once > thread.c:189: error: for each function it appears in.) > > Diff'ing thread.c between 7.4.1 and 7.4.2, it *looks* like, at first > blush, nothing changed that should affect the relevant code. > > Anybody got any idea what's broken? Unfortunately, I know exactly what is broken. First, pre7.4.2 didn't use thread.c properly. Now that it does, you are breaking on this issue: /* * Early POSIX draft of getpwuid_r() returns 'struct passwd *'. *getpwuid_r(uid, resultbuf, buffer, buflen) * Do we need to support it? bjm 2003-08-14 */ /* POSIX version */ getpwuid_r(uid, resultbuf, buffer, buflen, result); Notice the comment. Do we have to support getpwuid_r that returns passwd *, and doesn't take a fourth argument? Yea, for Solaris 7 & 8, we now we now do. I think you can get yours working by just changing the call to: result = getpwuid_r(uid, resultbuf, buffer, buflen); I will have to add configure tests for this and it will work properly for you in 7.4.3. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Syntax error reporting (was Re: [PATCHES] syntax error position "CREATE FUNCTION" bug fix)
[ moving thread to hackers ] Fabien COELHO <[EMAIL PROTECTED]> writes: > However, I still stick with my "bad" simple idea because the simpler the > better, and also because of the following example: > ... > psql> SELECT count_tup('pg_shadow'); > ERROR: syntax error at or near "FRM" at character 22 > CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement > As you can notice, the extract is not in the submitted query, so there > is no point to show it there. Yeah. However, I dislike your solution because it confuses the cases of a syntax error in the actually submitted query, and a syntax error in an internally generated query. We should keep these cases clearly separate because clients may want to do different things. For a syntax error in the submitted input, what you probably want to do is edit and resubmit the original query --- that's the case I was thinking about in saying that a GUI client like pgadmin would want to set the editing cursor in the original input window. But this action is nonsensical if the syntax error is from a generated query. Perhaps the GUI client could be smart enough to pop up a new window in which one could edit and resubmit the erroneous function definition. Even in psql's simplistic error handling, you want to distinguish the two cases. There's no point in showing the entire original query; one line worth of context is plenty. But you very probably do want to see all of a generated query. So I don't want the backend sending back error reports that look the same in both cases. The original design concept for the 'P' (position) error field is that it would be used to locate syntax errors in the *original query*, and so its presence is a cue to the client code to go in the direction of setting the editing cursor. (Note the protocol specification says "index into the original query string".) We have in fact misimplemented it, because it is being set for syntax errors in internally generated queries too. I was already planning to modify plpgsql to send back the full text of generated queries when there is an error. My intention was to supply this just as part of the CONTEXT stack, that is instead of your example of ERROR: syntax error at or near "FRM" at character 22 CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement you'd get something like ERROR: syntax error at or near "FRM" at character 22 CONTEXT: Executing command "SELECT COUNT(*) AS c FRM pg_shadow" PL/pgSQL function "count_tup" line 4 at for over execute statement However it might be better to invent a new error-message field that carries just the text of the SQL command, rather than stuffing it into CONTEXT. (This is similar to your original patch, but different in detail because I'm envisioning sending back generated queries, never the submitted query. Regurgitating the submitted query is just a waste of bandwidth.) The plus side of that would be that it'd be easy to extract for syntax-error highlighting. The minuses are that existing clients would fail to print such a field (the protocol spec says to ignore unknown fields), and that there is no good way to cope with nested queries. A possible compromise is to put the text of the generated SQL command into a new field only if the error is a syntax error, and put it into the CONTEXT stack otherwise. Syntax errors couldn't be nested so at least that problem goes away. This seems a bit messy though. The other thing to think about is whether we should invent a new field to carry syntax error position for generated queries, rather than making 'P' do double duty as it does now. If we don't do that then we have to change the protocol specification to reflect reality. In any case I think it has to be possible to tell very easily from the error message whether the 'P' position refers to the submitted query or a generated query. Comments anyone? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY formatting
> > > Karel Zak wrote: > >> The problem with CSV is that it will correctly work with new protocol >> only. Because old versions of clients are newline sensitive. And CSV >> can contains newline in by quotation marks defined attributes: >> >> "John", "Smith", "The White House >> 1600 Pennsylvania Avenue NW >> Washington, DC 20500", "male", "open source software office" >> >> It is one record. >> > > (Long Live President Smith!) > > I have never seen such a beast, and if I had been asked off the cuff > would have said that it was probably illegal, except that I know of no > standard actually defining the format. Perhaps others (Joshua Drake or > Josh Berkus?) have wider experience. I think in any case we should > ignore those for now and handle the straightforward case. > > I *have* seen monstrosities like fields that do not begin with the quote > character but then break into a quote, e.g.: > > 1,2,a,123"abc""def",6,7,8 > I have dealt with both, frequently. The CSV format allows you to begin a block of text with the quote. Newlines are included in that quoted space. If qoutes are included in the field then the quotes are double quotes to denote they are not part of the quoted space. Also, the following is valid. 1,2,,,"",,,"" "" is empty. 1,2,3,"", The 4 quotes denote a single double quote. Writing simple CSV converts that just explode on commas and newlines miss these oddities. Try exporting an Access table with a Memo field (containg multiple lines) to CSV. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Thomas, Andrew, Karel, Thomas is correct: many applications which read or make CSVs will accept a newline if it is enclosed in a quote. > > I *have* seen monstrosities like fields that do not begin with the quote > > character but then break into a quote, e.g.: > > > > 1,2,a,123"abc""def",6,7,8 This I have never seen. It looks like a hackish error to me. What application is it from? Frankly, I would expect any CSV reader to error out on the above, and would be annoyed if it did not. Overall, I assert again that approaching this issue through COPY enhancements is really not the way to go.We should be looking at a client utility, like pg_import and pg_export. The primary purpose of COPY is bulk loads for backup/restore, and I'm against doing a lot of tinkering which might make it less efficient or introduce new issues into what's currently very reliable. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY formatting
Bruce Momjian wrote: OK, so for the separator, quote, and escape options: separator is , quote is " escape is " so if the quote and escape are the same, then a double denotes a single? Yes. i.e. with the above settings "abc""def" -> abc"def cheers andrew ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY formatting
Josh Berkus wrote: Overall, I assert again that approaching this issue through COPY enhancements is really not the way to go.We should be looking at a client utility, like pg_import and pg_export. The primary purpose of COPY is bulk loads for backup/restore, and I'm against doing a lot of tinkering which might make it less efficient or introduce new issues into what's currently very reliable. That's not unreasonable. I floated my idea as an alternative to a much more radical proposal. If we decided against it we should remove the TODO item. As against that, if we don't do this then I think we should embrace these utility programs more, possibly bringing them into the distribution. cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] COPY formatting
Andrew Dunstan wrote: > Josh Berkus wrote: > > > > >Overall, I assert again that approaching this issue through COPY enhancements > >is really not the way to go.We should be looking at a client utility, > >like pg_import and pg_export. The primary purpose of COPY is bulk loads > >for backup/restore, and I'm against doing a lot of tinkering which might make > >it less efficient or introduce new issues into what's currently very > >reliable. > > > > > > > > That's not unreasonable. I floated my idea as an alternative to a much > more radical proposal. If we decided against it we should remove the > TODO item. > > As against that, if we don't do this then I think we should embrace > these utility programs more, possibly bringing them into the distribution. CSV seems to be the most widely requested conversion format. Anything else is probably a one-off job that should be done in perl or sed. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] looking for an 'array_index' function?
Fabien COELHO wrote: Am I unlucky or just blind? Unlucky I guess. No such function, least not yet. Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [HACKERS] fsync method checking
I wrote: > Note, too, that the preferred method isn't likely to depend just on the > operating system, it's likely to depend also on the filesystem type > being used. > > Linux provides quite a few of them: ext2, ext3, jfs, xfs, and reiserfs, > and that's just off the top of my head. I imagine the performance of > the various syncing methods will vary significantly between them. For what it's worth, my database throughput for transactions involving a lot of inserts, updates, and deletes is about 12% faster using fdatasync() than O_SYNC under Linux using JFS. I'll run the test program and report my results with it as well, so we'll be able to see if there's any consistency between it and the live database. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])