[HACKERS] Comment typo in xlog.c
Hi, I found two typos in xlog.c. Could you apply the attached patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center typo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] wal_sender_delay is still required?
Hi, Walsender doesn't need the periodic wakeups anymore, thanks to the latch feature. So wal_sender_delay is basically useless now. How about dropping wal_sender_delay or increasing the default value? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
Tom Lane t...@sss.pgh.pa.us writes: Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. It used not to being exposed at the SQL level, but just an internal loop in pg_execute_sql_file() when using the placeholders enabled variant. Then Itagaki wanted me to expose internals so that he basically can implement the logics in SQL directly. It seems like we went a step too far in exposing this facility too. Agreed in removing it at the SQL level. I assume you want me to prepare a patch, I'm not sure about being able to send it to the list before Thursday --- unless I get around the git network errors at pgday. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
2010/11/25 Shigeru HANADA han...@metrosystems.co.jp: Hi, hackers, Attached is a patch that adds file_fdw, FDW which reads records from files on the server side, as a contrib module. This patch is based on SQL/MED core functionality patch. [SQL/MED - core functionality] http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php File_fdw can be installed with the steps similar to other contrib modules, and you can create FDW with the script: $SHAREDIR/contrib/file_fdw.sql Note that you need to create file_fdw for each database. Document for file_fdw is included in the patch, although the contents might not be enough. Any comments and questions are welcome. I think it is better to add encoding option to FileFdwOption. In the patch the encoding of file is assumed as client_encoding, but we may want to SELECT from different-encoded csv in a query. Apart from the issue with fdw, I've been thinking client_encoding for COPY is not appropriate in any way. client_encoding is the encoding of the statement the client sends, not the COPY target which is on the server's filesystem. Adding encoding option to COPY will eliminate allowEncodingChanges option from JDBC driver. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comment typo in xlog.c
On 06.12.2010 09:01, Fujii Masao wrote: I found two typos in xlog.c. Could you apply the attached patch? Applied. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut pete...@gmx.net wrote: Here is an updated patch to address the issues discussed during this commitfest. Here are comments and questions after I tested the latest patch: Issues * initdb itself seems to be succeeded, but it says could not determine encoding for locale messages for any combination of encoding=utf8/eucjp and locale=ja_JP.utf8/ja_JP.eucjp/C. Is it an expected behavior? creating collations ...initdb: locale name has non-ASCII characters, skipped: bokm虱 initdb: locale name has non-ASCII characters, skipped: fran軋is could not determine encoding for locale hy_AM.armscii8: codeset is ARMSCII-8 ... (a dozen of lines) ... could not determine encoding for locale vi_VN.tcvn: codeset is TCVN5712-1 ok * contrib/citext raises an encoding error when COLLATE is specified even if it is the collation as same as the database default. We might need some special treatment for C locale. =# SHOW lc_collate; == C =# SELECT ('A'::citext) = ('a'::citext); == false =# SELECT ('A'::citext) = ('a'::citext) COLLATE C; ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. * pg_dump would generate unportable files for different platforms because collation names Source codes * PG_GETARG_COLLATION might be a better name rather than PG_GET_COLLATION. * What is the different between InvalidOid and DEFAULT_COLLATION_OID for collation oids? The patch replaces DirectFunctionCall to DirectFunctionCallC in some places, but we could shrink the diff size if we can use InvalidOid instead of DEFAULT_COLLATION_OID, * I still think an explicit passing collations from-function-to-function is horrible because we might forget it in some places, and almost existing third party module won't work. Is it possible to make it a global variable, and push/pop the state when changed? Sorry I'm missing something, but I think we could treat the collation setting as like as GUC settings. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #5662: Incomplete view
On sön, 2010-09-19 at 14:28 -0400, Tom Lane wrote: Or maybe we could implement that function, call it like this CAST((pg_sequence_parameters(c.oid)).max_value AS cardinal_number) AS maximum_value, and plan on optimizing the view when we get LATERAL. Here is an implementation of that. I'm not exactly sure if the accesses to the sequence are correctly locked/unlocked, but it appears to work. I also revised the definition of the info schema view slightly, after juggling several more recent SQL standard drafts. diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml index 9d30949..0c1cb5d 100644 --- a/doc/src/sgml/information_schema.sgml +++ b/doc/src/sgml/information_schema.sgml @@ -4013,31 +4013,42 @@ ORDER BY c.ordinal_position; /row row - entryliteralmaximum_value/literal/entry - entrytypecardinal_number/type/entry - entryNot yet implemented/entry + entryliteralstart_value/literal/entry + entrytypecharacter_data/type/entry + entryThe start value of the sequence/entry /row row entryliteralminimum_value/literal/entry - entrytypecardinal_number/type/entry - entryNot yet implemented/entry + entrytypecharacter_data/type/entry + entryThe minimum value of the sequence/entry + /row + + row + entryliteralmaximum_value/literal/entry + entrytypecharacter_data/type/entry + entryThe maximum value of the sequence/entry /row row entryliteralincrement/literal/entry - entrytypecardinal_number/type/entry - entryNot yet implemented/entry + entrytypecharacter_data/type/entry + entryThe increment of the sequence/entry /row row entryliteralcycle_option/literal/entry entrytypeyes_or_no/type/entry - entryNot yet implemented/entry + entryliteralYES/literal if the sequence cycles, else literalNO/literal/entry /row /tbody /tgroup /table + + para + Note that in accordance with the SQL standard, the start, minimum, + maximum, and increment values are returned as character strings. + /para /sect1 sect1 id=infoschema-sql-features diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 8d9790d..1c2bd85 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -1430,16 +1430,17 @@ CREATE VIEW sequences AS CAST(64 AS cardinal_number) AS numeric_precision, CAST(2 AS cardinal_number) AS numeric_precision_radix, CAST(0 AS cardinal_number) AS numeric_scale, - CAST(null AS cardinal_number) AS maximum_value, -- FIXME - CAST(null AS cardinal_number) AS minimum_value, -- FIXME - CAST(null AS cardinal_number) AS increment, -- FIXME - CAST(null AS yes_or_no) AS cycle_option-- FIXME + CAST((pg_sequence_parameters(c.oid)).start_value AS character_data) AS start_value, + CAST((pg_sequence_parameters(c.oid)).minimum_value AS character_data) AS minimum_value, + CAST((pg_sequence_parameters(c.oid)).maximum_value AS character_data) AS maximum_value, + CAST((pg_sequence_parameters(c.oid)).increment AS character_data) AS increment, + CAST(CASE WHEN (pg_sequence_parameters(c.oid)).cycle_option THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option FROM pg_namespace nc, pg_class c WHERE c.relnamespace = nc.oid AND c.relkind = 'S' AND (NOT pg_is_other_temp_schema(nc.oid)) AND (pg_has_role(c.relowner, 'USAGE') - OR has_table_privilege(c.oid, 'SELECT, UPDATE') ); + OR has_sequence_privilege(c.oid, 'SELECT, UPDATE, USAGE') ); GRANT SELECT ON sequences TO PUBLIC; diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index bb8ebce..0070bb0 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -24,6 +24,7 @@ #include commands/defrem.h #include commands/sequence.h #include commands/tablecmds.h +#include funcapi.h #include miscadmin.h #include nodes/makefuncs.h #include storage/bufmgr.h @@ -1420,6 +1421,56 @@ process_owned_by(Relation seqrel, List *owned_by) } +/* + * Return sequence parameters, for use by information schema + */ +Datum +pg_sequence_parameters(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + TupleDesc tupdesc; + Datum values[5]; + bool isnull[5]; + SeqTable elm; + Relation seqrel; + Buffer buf; + Form_pg_sequence seq; + + /* open and AccessShareLock sequence */ + init_sequence(relid, elm, seqrel); + + if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg(permission denied for sequence %s, + RelationGetRelationName(seqrel; + + tupdesc = CreateTemplateTupleDesc(5, false); +
Re: [HACKERS] WIP patch for parallel pg_dump
On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 02:55, Robert Haas wrote: On Sun, Dec 5, 2010 at 1:28 PM, Tom Lanet...@sss.pgh.pa.us wrote: I'm wondering if we should reconsider the pass-it-through-the-client approach, because if we could make that work it would be more general and it wouldn't need any special privileges. The trick seems to be to apply sufficient sanity testing to the snapshot proposed to be installed in the subsidiary transaction. I think the requirements would basically be (1) xmin= any listed XIDs xmax (2) xmin not so old as to cause GlobalXmin to decrease (3) xmax not beyond current XID counter (4) XID list includes all still-running XIDs in the given range Thoughts? I think this is too ugly to live. I really think it's a very bad idea for database clients to need to explicitly know anywhere near this many details about how the server represents snapshots. It's not impossible we might want to change this in the future, and even if we don't, it seems to me to be exposing a whole lot of unnecessary internal grottiness. The client doesn't need to know anything about the snapshot blob that the server gives it. It just needs to pass it back to the server through the other connection. To the client, it's just an opaque chunk of bytes. I suppose that would work, but I still think it's a bad idea. We made this mistake with expression trees. Any oversight in the code that validates the chunk of bytes when it (or a modified version) is sent back to the server turns into a security hole. I think it's a whole lot simpler and cleaner to keep the representation details private to the server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
On Mon, Dec 6, 2010 at 5:48 AM, Hitoshi Harada umi.tan...@gmail.com wrote: I think it is better to add encoding option to FileFdwOption. In the patch the encoding of file is assumed as client_encoding, but we may want to SELECT from different-encoded csv in a query. Apart from the issue with fdw, I've been thinking client_encoding for COPY is not appropriate in any way. client_encoding is the encoding of the statement the client sends, not the COPY target which is on the server's filesystem. Adding encoding option to COPY will eliminate allowEncodingChanges option from JDBC driver. Yeah, this point has been raised before, and I agree with it. I haven't heard anyone who speaks a European language complain about this, but it seems to keep coming up for Japanese speakers. I am guessing that means that using multiple encodings is fairly common in Japan. I typically don't run into anything other than UTF-8 and Latin-1, which are mostly compatible especially if you're an English speaker, but if it weren't for that happy coincidence I think this would be quite annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On Sun, Dec 5, 2010 at 5:10 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer m...@msys.ch wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. It's only horrible if you stick to printf style formatting and you are using sting techniques to inject parameters into the query. Non parameterized queries should obviously be discouraged. However, it's entirely possible to wrap the parameterized interfaces with vararg interface (I should know, because we did exactly that) :-). This gives you the best of both worlds, easy coding without sacrificing safety. You might not remember the libpqtypes proposal, but libpq was specifically extended with callbacks so that libpqtypes could exist after the community determined that libpqtypes was too big of a change to the libpq library. I think ultimately this should be revisited, with libpqtypes going in core or something even richer...I've been thinking for a while that postgres types should be abstracted out of the backend into a library that both client and server depend on. With libpqtypes, we decided to use postgres style format markers: select PQexecf(conn, select %int4 + %int8, an_int, a_bigint); Everything is schema qualified, so that user types are supported (of course, this requires implementing handling on the client). Data routed through the binary protocol, with all the byte swapping etc handled by the library. No escaping necessary. We also added full support for arrays and composites, which are a nightmare to deal with over straight libpq, and various other niceties like thread safe error handling. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 06.12.2010 14:57, Robert Haas wrote: On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The client doesn't need to know anything about the snapshot blob that the server gives it. It just needs to pass it back to the server through the other connection. To the client, it's just an opaque chunk of bytes. I suppose that would work, but I still think it's a bad idea. We made this mistake with expression trees. Any oversight in the code that validates the chunk of bytes when it (or a modified version) is sent back to the server turns into a security hole. True, but a snapshot is a lot simpler than an expression tree. It's pretty much impossible to plug all the holes in the expression-tree reading functions, and keep them hole-free in the future. The expression tree format is constantly in flux. A snapshot, however, is a fairly isolated small data structure that rarely changes. I think it's a whole lot simpler and cleaner to keep the representation details private to the server. Well, then you need some sort of cross-backend communication, which is always a bit clumsy. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
On Sat, 4 Dec 2010, Greg Stark wrote: On Sat, Dec 4, 2010 at 6:07 PM, Oleg Bartunov o...@sai.msu.su wrote: We'll sync contrib/btree_gist with current API. Also, we're thinking about distance for ltree, boxes, circles. I'm not sure about polygons, though. So, we'll have rather complete set of knn-fied data types. I kind of assumed the natural client for KNN-gist was the tsearch full text search indexes handling sorting by relevance. For example if I search for Postgres DBA I should find documents where those words appear adjacent first and documents where the two words appear far apart in the document sorted further down. Is that not on the list of operators supported or planned to be supported? We'll start thinking about this once we know how to store coordinate information in index :) Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 14:57, Robert Haas wrote: On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The client doesn't need to know anything about the snapshot blob that the server gives it. It just needs to pass it back to the server through the other connection. To the client, it's just an opaque chunk of bytes. I suppose that would work, but I still think it's a bad idea. We made this mistake with expression trees. Any oversight in the code that validates the chunk of bytes when it (or a modified version) is sent back to the server turns into a security hole. True, but a snapshot is a lot simpler than an expression tree. It's pretty much impossible to plug all the holes in the expression-tree reading functions, and keep them hole-free in the future. The expression tree format is constantly in flux. A snapshot, however, is a fairly isolated small data structure that rarely changes. I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. I think it's a whole lot simpler and cleaner to keep the representation details private to the server. Well, then you need some sort of cross-backend communication, which is always a bit clumsy. A temp file seems quite sufficient, and not at all difficult. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
On 06.12.2010 08:51, Fujii Masao wrote: On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masaomasao.fu...@gmail.com wrote: On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masaomasao.fu...@gmail.com wrote: The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait -forever' (i.e., you set allow_standalone_master to false), the master should detect the standby crash as soon as possible by using the timeout. For example, imagine that max_wal_senders is set to one and the master cannot detect the standby crash because of absence of the timeout. In this case, even if you start new standby, it will not be able to connect to the master since there is no free walsender slot. As the result, the master actually waits forever. This occurred to me that the timeout would be required even for asynchronous streaming replication. So, how about implementing the replication timeout feature before synchronous replication itself? Here is the patch. This is one of features required for synchronous replication, so I added this into current CF as a part of synchronous replication. Hmm, that's actually a quite different timeout than what's required for synchronous replication. In synchronous replication, you need to get an acknowledgment within a timeout. This patch only puts a timeout on how long we wait to have enough room in the TCP send buffer. That doesn't seem all that useful. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
Am 06.12.10 15:37, schrieb Merlin Moncure: On Sun, Dec 5, 2010 at 5:10 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer m...@msys.ch wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. It's only horrible if you stick to printf style formatting and you are using sting techniques to inject parameters into the query. Non parameterized queries should obviously be discouraged. However, it's entirely possible to wrap the parameterized interfaces with vararg interface (I should know, because we did exactly that) :-). This gives you the best of both worlds, easy coding without sacrificing safety. You might not remember the libpqtypes proposal, but libpq was specifically extended with callbacks so that libpqtypes could exist after the community determined that libpqtypes was too big of a change to the libpq library. I think ultimately this should be revisited, with libpqtypes going in core or something even richer...I've been thinking for a while that postgres types should be abstracted out of the backend into a library that both client and server depend on. With libpqtypes, we decided to use postgres style format markers: select PQexecf(conn, select %int4 + %int8, an_int, a_bigint); Everything is schema qualified, so that user types are supported (of course, this requires implementing handling on the client). Data routed through the binary protocol, with all the byte swapping etc handled by the library. No escaping necessary. We also added full support for arrays and composites, which are a nightmare to deal with over straight libpq, and various other niceties like thread safe error handling. That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. That would be a small function, and reasonably safe. Or rather, the safety is in the hands of the programmer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 06.12.2010 15:53, Robert Haas wrote: I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. Note that we already have txid_current_snapshot() function, which exposes all that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_sender_delay is still required?
Fujii Masao masao.fu...@gmail.com writes: Walsender doesn't need the periodic wakeups anymore, thanks to the latch feature. So wal_sender_delay is basically useless now. How about dropping wal_sender_delay or increasing the default value? If we don't need it, we should remove it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On Mon, Dec 6, 2010 at 9:55 AM, Marc Balmer m...@msys.ch wrote: Am 06.12.10 15:37, schrieb Merlin Moncure: On Sun, Dec 5, 2010 at 5:10 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer m...@msys.ch wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. It's only horrible if you stick to printf style formatting and you are using sting techniques to inject parameters into the query. Non parameterized queries should obviously be discouraged. However, it's entirely possible to wrap the parameterized interfaces with vararg interface (I should know, because we did exactly that) :-). This gives you the best of both worlds, easy coding without sacrificing safety. You might not remember the libpqtypes proposal, but libpq was specifically extended with callbacks so that libpqtypes could exist after the community determined that libpqtypes was too big of a change to the libpq library. I think ultimately this should be revisited, with libpqtypes going in core or something even richer...I've been thinking for a while that postgres types should be abstracted out of the backend into a library that both client and server depend on. With libpqtypes, we decided to use postgres style format markers: select PQexecf(conn, select %int4 + %int8, an_int, a_bigint); Everything is schema qualified, so that user types are supported (of course, this requires implementing handling on the client). Data routed through the binary protocol, with all the byte swapping etc handled by the library. No escaping necessary. We also added full support for arrays and composites, which are a nightmare to deal with over straight libpq, and various other niceties like thread safe error handling. That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... well, it's already written. All you would have to do is compile it. As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. libpqtypes doesn't escape at all. It uses the internal parameterized interfaces that don't require it. For particular types, like bytea and timestamps, this much faster because we use the binary wire format. Less load on the client and the server. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. That would be a small function, and reasonably safe. Or rather, the safety is in the hands of the programmer. What you are suggesting doesn't provide a lot of value over sprintf the query first, then exec it. You can do what you are suggesting yourself, wrapping PQexec: A hypothetical wrapper would be implemented something like: va_list ap; char buf[BUFSZ]; va_start(ap, query) vsnprintf(buf, BUFSZ. query, ap); va_end(ap); return PQexec(buf); This is a bad idea (security, escaping, performance)...we wrote a faster, safer way to do it, with richer type support. Or you can do it yourself. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. It used not to being exposed at the SQL level, but just an internal loop in pg_execute_sql_file() when using the placeholders enabled variant. Then Itagaki wanted me to expose internals so that he basically can implement the logics in SQL directly. It seems like we went a step too far in exposing this facility too. Agreed in removing it at the SQL level. Well, actually, my next question was going to be about removing the variadic substitution in pg_execute_string too. It's not apparent to me that that function should have a rather lame substitution mechanism hard-wired into it, when you can do the same thing with replace() in front of it. On the whole I'd prefer not to have any substitution functionality hard-wired into pg_execute_file either, though I can see the argument that it's necessary for practical use. Basically I'm concerned that replace-equivalent behavior is not going to be satisfactory over the long run: I think eventually we're going to need to think about quoting/escaping behavior. So I think it's a bad idea to expose the assumption that it'll be done that way at the SQL level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
On Mon, Dec 6, 2010 at 9:54 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: This occurred to me that the timeout would be required even for asynchronous streaming replication. So, how about implementing the replication timeout feature before synchronous replication itself? Here is the patch. This is one of features required for synchronous replication, so I added this into current CF as a part of synchronous replication. Hmm, that's actually a quite different timeout than what's required for synchronous replication. In synchronous replication, you need to get an acknowledgment within a timeout. This patch only puts a timeout on how long we wait to have enough room in the TCP send buffer. That doesn't seem all that useful. Yeah. If we rely on the TCP send buffer filling up, then the amount of time the master takes to notice a dead standby is going to be hard for the user to predict. I think the standby ought to send some sort of heartbeat and the master should declare the standby dead if it doesn't see a heartbeat soon enough. Maybe the heartbeat could even include the receive/fsync/replay LSNs, so that sync rep can use the same machinery but with more aggressive policies about when they must be sent. I also can't help noticing that this approach requires drilling a hole through the abstraction stack. We just invented latches; if the API is going to have to change every time someone wants to implement a feature, we've built ourselves an awfully porous abstraction layer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 15:53, Robert Haas wrote: I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. Note that we already have txid_current_snapshot() function, which exposes all that. Fair enough, and I think that's actually useful for Slony c. But I don't think we should shy away of providing a cleaner API here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FK's to refer to rows in inheritance child
Excerpts from Andrew Dunstan's message of dom dic 05 14:41:20 -0300 2010: What I would like to see is people publishing the location of development repos so that they can be pulled from or merged, especially for any large patch. Yes, this is pretty useful. Dimitri published his repos for the extension stuff which I followed for a while, made some smallish changes and sent them back, etc. Very time-saving. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On Mon, Dec 06, 2010 at 10:14:55AM -0500, Merlin Moncure wrote: On Mon, Dec 6, 2010 at 9:55 AM, Marc Balmer m...@msys.ch wrote: Am 06.12.10 15:37, schrieb Merlin Moncure: On Sun, Dec 5, 2010 at 5:10 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer m...@msys.ch wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. It's only horrible if you stick to printf style formatting and you are using sting techniques to inject parameters into the query. ?Non parameterized queries should obviously be discouraged. ?However, it's entirely possible to wrap the parameterized interfaces with vararg interface (I should know, because we did exactly that) :-). ?This gives you the best of both worlds, easy coding without sacrificing safety. ?You might not remember the libpqtypes proposal, but libpq was specifically extended with callbacks so that libpqtypes could exist after the community determined that libpqtypes was too big of a change to the libpq library. ?I think ultimately this should be revisited, with libpqtypes going in core or something even richer...I've been thinking for a while that postgres types should be abstracted out of the backend into a library that both client and server depend on. With libpqtypes, we decided to use postgres style format markers: select PQexecf(conn, select %int4 + %int8, an_int, a_bigint); Everything is schema qualified, so that user types are supported (of course, this requires implementing handling on the client). Data routed through the binary protocol, with all the byte swapping etc handled by the library. ?No escaping necessary. ?We also added full support for arrays and composites, which are a nightmare to deal with over straight libpq, and various other niceties like thread safe error handling. That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... well, it's already written. All you would have to do is compile it. As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. ?I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. libpqtypes doesn't escape at all. It uses the internal parameterized interfaces that don't require it. For particular types, like bytea and timestamps, this much faster because we use the binary wire format. Less load on the client and the server. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) ? ? ? ?sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. That would be a small function, and reasonably safe. ?Or rather, the safety is in the hands of the programmer. What you are suggesting doesn't provide a lot of value over sprintf the query first, then exec it. You can do what you are suggesting yourself, wrapping PQexec: A hypothetical wrapper would be implemented something like: va_list ap; char buf[BUFSZ]; va_start(ap, query) vsnprintf(buf, BUFSZ. query, ap); va_end(ap); return PQexec(buf); This is a bad idea (security, escaping, performance)...we wrote a faster, safer way to do it, with richer type support. Or you can do it yourself. merlin I have used the libpqtypes library and it is very easy to use. +1 for adding it or something like it to the PostgreSQL core. I have people who will try and roll their own because it does not come with the core. While it is a hoot to see what reinventing the wheel produces, it is also prone to mistakes. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... Sometimes complex and large solutions are required for the simplest of ideas. I believe this is one of those cases. You can't solve the printf style PQexec properly by merely implementing a sprintf wrapper. As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. This suffers from becoming cryptic over time, see Tom Lane's comments back in 2007 on this (http://archives.postgresql.org/pgsql-hackers/2007-12/msg00362.php). libpqtypes uses the human readable %schema.typename (schema is optional) to specify format specifiers. There is no learning curve or ambiguity, if you want a point than use %point, or %my_type libpqtypes allows you to register aliases (PQregisterSubClasses) so that you can map %text to %s to make it feel more like C.. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 12/06/2010 10:22 AM, Robert Haas wrote: On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 15:53, Robert Haas wrote: I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. Note that we already have txid_current_snapshot() function, which exposes all that. Fair enough, and I think that's actually useful for Slonyc. But I don't think we should shy away of providing a cleaner API here. Just don't let the perfect get in the way of the good :P cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Mon, Dec 6, 2010 at 10:35 AM, Andrew Dunstan and...@dunslane.net wrote: On 12/06/2010 10:22 AM, Robert Haas wrote: On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 15:53, Robert Haas wrote: I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. Note that we already have txid_current_snapshot() function, which exposes all that. Fair enough, and I think that's actually useful for Slonyc. But I don't think we should shy away of providing a cleaner API here. Just don't let the perfect get in the way of the good :P I'll keep that in mind. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, then you need some sort of cross-backend communication, which is always a bit clumsy. A temp file seems quite sufficient, and not at all difficult. Not at all difficult is nonsense. To do that, you need to invent some mechanism for sender and receivers to identify which temp file they want to use, and you need to think of some way to clean up the files when the client forgets to tell you to do so. That's going to be at least as ugly as anything else. And I think it's unproven that this approach would be security-hole-free either. For instance, what about some other session overwriting pg_dump's snapshot temp file? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Mon, Dec 6, 2010 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, then you need some sort of cross-backend communication, which is always a bit clumsy. A temp file seems quite sufficient, and not at all difficult. Not at all difficult is nonsense. To do that, you need to invent some mechanism for sender and receivers to identify which temp file they want to use, Why is this even remotely hard? That's the whole point of having the publish operation return a token. The token either is, or uniquely identifies, the file name. and you need to think of some way to clean up the files when the client forgets to tell you to do so. That's going to be at least as ugly as anything else. Backends don't forget to call their end-of-transaction hooks, do they? They might crash, but we already have code to remove temp files on server restart. At most it would need minor adjustment. And I think it's unproven that this approach would be security-hole-free either. For instance, what about some other session overwriting pg_dump's snapshot temp file? Why would this be any different from any other temp file? We surely must have a mechanism in place to ensure that the temporary files used by sorts or hash joins don't get overwritten by some other session, or the system would be totally unstable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 12/06/2010 10:40 AM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, then you need some sort of cross-backend communication, which is always a bit clumsy. A temp file seems quite sufficient, and not at all difficult. Not at all difficult is nonsense. To do that, you need to invent some mechanism for sender and receivers to identify which temp file they want to use, and you need to think of some way to clean up the files when the client forgets to tell you to do so. That's going to be at least as ugly as anything else. And I think it's unproven that this approach would be security-hole-free either. For instance, what about some other session overwriting pg_dump's snapshot temp file? Yeah. I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
IMO, it would be better to implement some utility functions to make it easy to construct arrays dynamically for PQexecParams and PQexecPrepared. This seems to me more universal solution and it is useful for both -- high level libpq-libraries authors and for those who like to use libpq directly. 2010/12/6 Andrew Chernow a...@esilo.com That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... Sometimes complex and large solutions are required for the simplest of ideas. I believe this is one of those cases. You can't solve the printf style PQexec properly by merely implementing a sprintf wrapper. As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. This suffers from becoming cryptic over time, see Tom Lane's comments back in 2007 on this ( http://archives.postgresql.org/pgsql-hackers/2007-12/msg00362.php). libpqtypes uses the human readable %schema.typename (schema is optional) to specify format specifiers. There is no learning curve or ambiguity, if you want a point than use %point, or %my_type libpqtypes allows you to register aliases (PQregisterSubClasses) so that you can map %text to %s to make it feel more like C.. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- // Dmitriy.
Re: [HACKERS] Suggesting a libpq addition
On 12/6/2010 11:40 AM, Dmitriy Igrishin wrote: IMO, it would be better to implement some utility functions to make it easy to construct arrays dynamically for PQexecParams and PQexecPrepared. This seems to me more universal solution and it is useful for both -- high level libpq-libraries authors and for those who like to use libpq directly. Hmm, your idea isn't better, it is identical to what libpqtypes already does :) http://libpqtypes.esilo.com/browse_source.html?file=exec.c We wrap PQexecParams and friends. You are coding libpq. We extended much effort to provide the same result interface (PGresult), including handling composites and arrays. You getf composites and arrays as PGresults; where a composite is a single tuple multiple field result, an array is a multiple tuple single field result and composite arrays are multiple tuples and multiple fields. We've just made a more formal set of utility functions, typically called an API, in an attempt to match the coding standards of the postgresql project. There is no libpq param interface like results, so we added PGparam stuff. This allows you to pack parameters (PQputf) and than execute it. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
2010/12/6 Andrew Chernow a...@esilo.com On 12/6/2010 11:40 AM, Dmitriy Igrishin wrote: IMO, it would be better to implement some utility functions to make it easy to construct arrays dynamically for PQexecParams and PQexecPrepared. This seems to me more universal solution and it is useful for both -- high level libpq-libraries authors and for those who like to use libpq directly. Hmm, your idea isn't better, it is identical to what libpqtypes already does :) http://libpqtypes.esilo.com/browse_source.html?file=exec.c Actually I don't need this functionality :-). I've implemented a library on C++ which does many things, including auto memory management, type conversion and binary transfers easy... But I believe, that including proposed utility functions are better than printf-like addition... Although, both of these a excess. We wrap PQexecParams and friends. You are coding libpq. We extended much effort to provide the same result interface (PGresult), including handling composites and arrays. You getf composites and arrays as PGresults; where a composite is a single tuple multiple field result, an array is a multiple tuple single field result and composite arrays are multiple tuples and multiple fields. We've just made a more formal set of utility functions, typically called an API, in an attempt to match the coding standards of the postgresql project. There is no libpq param interface like results, so we added PGparam stuff. This allows you to pack parameters (PQputf) and than execute it. So, let libpq will not be bloated. Let libpq remain low-level library for projects like libpqtypes, pqxx and so on (my library too) ;-) -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- // Dmitriy.
Re: [HACKERS] allow COPY routines to read arbitrary numbers of fields
Andrew Dunstan and...@dunslane.net writes: Attached is a patch that allows CopyReadAttibutesText() and CopyReadAttributesCSV() to read arbitrary numbers of attributes. Underflowing attributes are recorded as null, and space is made for overflowing attributes on a line. Why are you still passing nfields as a separate parameter instead of relying on the value you added to the struct? That can't do anything except cause confusion, especially once the two values diverge due to a previous array-expansion. Also, why did you change the setup code to not compute nfields in binary mode? That seems at best an unnecessary change, and at worst a breakage of the binary path --- did you test it? Also please be a little more careful with the formatting. This for instance is pretty sloppy: ! * strings. cstate-raw_fields[k] is set to point to the k'th attribute ! * string, * or NULL when the input matches the null marker string. and there seem to be some gratuitous whitespace changes as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
I wrote: Tom Lane wrote: I assume this would have to be a hard definition of READ ONLY, not the rather squishy definition we use now? I'm excluding temporary tables from SSI on the grounds that they are only read and written by a single transaction and therefore can't be a source of rw-dependencies, and I'm excluding system tables on the grounds that they don't follow normal snapshot isolation rules. Hint bit rewrites are not an issue for SSI. Are there any other squishy aspect I might need to consider? I reviewed the documentation and played around with this a bit and can't find any areas where the current PostgreSQL implementation of READ ONLY is incompatible with what is needed for the SSI optimizations where it is used. There are a large number of tests which exercise this, and they're all passing. Did you have something in particular in mind which I should check? An example of code you think might break would be ideal, but anything more concrete than the word squishy would be welcome. Any thoughts on the original question about what to use as a heavyweight lock to support the subject feature? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FK's to refer to rows in inheritance child
Robert Haas robertmh...@gmail.com writes: On Sun, Dec 5, 2010 at 12:41 PM, Andrew Dunstan and...@dunslane.net wrote: Well, ISTM that amounts to not having official topic branches :-) I agree that this is supposed to be one of git's strengths (or more exactly a strength of distributed SCM's generally). I don't really see any great value in sanctifying a particular topic branch with some official status. I think the value in an official topic branch would be to allow formal incremental commit of large patches. In other words, we could decide that a commit to the official topic branch must meet the same standards of quality normally applied to a commit to the master branch, and must go through the same process. It would be understood that the topic branch would eventually be merged (with or without squash) back into the master branch, but that objections were to be raised as pieces were committed to the topic branch, not at merge time. The merge itself would require consensus as to timing, but we'd agree to take a dim view of I haven't reviewed anything that's been going on here for the last six months but now hate all of it. Topic branches defined that way seem like a pretty bad idea from here. They would save no effort at all for committers, because if you're not allowed to object to something after it's gone into a topic branch, then it's just like master in terms of having to keep up with changes as they happen. Moreover, we'd have to keep them in pretty close sync with master --- otherwise what happens when you discover that some long-ago change on master breaks the topic branch? So AFAICS this would just increase the amount of keeping-branches-in-sync dogwork without any offsetting advantage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Andrew Dunstan and...@dunslane.net writes: Yeah. I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? They're not of a very predictable size. Robert's idea of publish() returning a temp file identifier, which then gets removed at transaction end, might work all right. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Andrew Dunstan and...@dunslane.net writes: Why not just say give me the snapshot currently held by process ? There's not a unique snapshot held by a particular process. Also, we don't want to expend the overhead to fully publish every snapshot. I think it's really necessary that the sending process take some deliberate action to publish a snapshot. And please, not temp files if possible. Barring the cleanup issue, I don't see why not. This is a relatively low-usage feature, I think, so I wouldn't be much in favor of dedicating shmem to it even if the space requirement were predictable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
Robert Haas robertmh...@gmail.com writes: One possible way to do make an improvement in this area would be to move the responsibility for accepting connections out of the postmaster. Instead, you'd have a group of children that would all call accept() on the socket, and the OS would arbitrarily pick one to receive each new incoming connection. The postmaster would just be responsible for making sure that there were enough children hanging around. You could in fact make this change without doing anything else, in which case it wouldn't save any work but would possibly reduce connection latency a bit since more of the work could be done before the connection actually arrived. This seems like potentially a good idea independent of anything else, just to reduce connection latency: fork() (not to mention exec() on Windows) now happens before not after receipt of the connection request. However, I see a couple of stumbling blocks: 1. Does accept() work that way everywhere (Windows, I'm looking at you) 2. What do you do when max_connections is exceeded, and you don't have anybody at all listening on the socket? Right now we are at least able to send back an error message explaining the problem. Another issue that would require some thought is what algorithm the postmaster uses for deciding to spawn new children. But that doesn't sound like a potential showstopper. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
Robert Haas robertmh...@gmail.com writes: On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. When would it be superior to replace(replace(orig, from1, to1), from2, to2), ... An iterated replacement has different semantics from a simultaneous replace - replacing N placeholders with values simultaneously means you don't need to worry about the case where one of the replacement strings contains something that looks like a placeholder. Good point, but what the patch implements is in fact iterated replacement ... or at least it looked that way in a quick once-over. I actually think a simultaneous replacement feature would be quite handy but I make no comment on whether it belongs as part of this patch. My point is that the replacement stuff really really needs to be factored out of the string-execution stuff, precisely because the desired behavior is debatable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On 12/06/2010 09:38 AM, Tom Lane wrote: Another issue that would require some thought is what algorithm the postmaster uses for deciding to spawn new children. But that doesn't sound like a potential showstopper. We'd probably want a couple of different ones, optimized for different connection patterns. Realistically. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 12/06/2010 12:28 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: Yeah. I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? They're not of a very predictable size. Ah. Ok. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Mon, Dec 6, 2010 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: One possible way to do make an improvement in this area would be to move the responsibility for accepting connections out of the postmaster. Instead, you'd have a group of children that would all call accept() on the socket, and the OS would arbitrarily pick one to receive each new incoming connection. The postmaster would just be responsible for making sure that there were enough children hanging around. You could in fact make this change without doing anything else, in which case it wouldn't save any work but would possibly reduce connection latency a bit since more of the work could be done before the connection actually arrived. This seems like potentially a good idea independent of anything else, just to reduce connection latency: fork() (not to mention exec() on Windows) now happens before not after receipt of the connection request. However, I see a couple of stumbling blocks: 1. Does accept() work that way everywhere (Windows, I'm looking at you) Not sure. It might be useful to look at what Apache does, but I don't have time to do that ATM. 2. What do you do when max_connections is exceeded, and you don't have anybody at all listening on the socket? Right now we are at least able to send back an error message explaining the problem. Sending back an error message explaining the problem seems like a non-negotiable requirement. I'm not quite sure how to dance around this. Perhaps if max_connections is exhausted, the postmaster itself joins the accept() queue and launches a dead-end backend for each new connection. Or perhaps we reserve one extra backend slot for a probably-dead-end backend that will just sit there and mail rejection notices; except that if it sees that a regular backend slot has opened up it grabs it and turns itself into a regular backend. Another issue that would require some thought is what algorithm the postmaster uses for deciding to spawn new children. But that doesn't sound like a potential showstopper. The obvious algorithm would be to try to keep N spare workers around. Any time the number of unconnected backends drops below N the postmaster starts spawning new ones until it gets back up to N. I think the trick may not be the algorithm so much as finding a way to make the signaling sufficiently robust and lightweight. For example, I bet having each child that gets a new connection signal() the postmaster is a bad plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
At some point Hackers should look at pg vs MySQL multi tenantry but it is way tangential today. My understanding is that our schemas work like MySQL databases; and our databases are an even higher level of isolation. No? That's correct. Drizzle is looking at implementing a feature like our databases called catalogs (per the SQL spec). Let me stress that not everyone is happy with the MySQL multi-tenantry approach. But it does make multi-tenancy on a scale which you seldom see with PG possible, even if it has problems. It's worth seeing whether we can steal any of their optimization ideas without breaking PG. I was specifically looking at the login model, which works around the issue that we have: namely that different login ROLEs can't share a connection pool. In MySQL, they can share the built-in connection pool because role-switching effectively is a session variable. AFAICT, anyway. For that matter, if anyone knows any other DB which does multi-tenant well/better, we should be looking at them too. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On Dec 6, 2010, at 4:06 AM, Itagaki Takahiro wrote: * contrib/citext raises an encoding error when COLLATE is specified even if it is the collation as same as the database default. We might need some special treatment for C locale. I've been wondering if this patch will support case-insensitve collations. If so, then citext should probably be revised to use one. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Dec 6, 2010, at 7:19 AM, Tom Lane wrote: On the whole I'd prefer not to have any substitution functionality hard-wired into pg_execute_file either, though I can see the argument that it's necessary for practical use. Basically I'm concerned that replace-equivalent behavior is not going to be satisfactory over the long run: I think eventually we're going to need to think about quoting/escaping behavior. So I think it's a bad idea to expose the assumption that it'll be done that way at the SQL level. +1 I suspect that, for the purposes of the extensions patch, if CREATE EXTENSION could be modified to handle setting the schema itself, without requiring that the file have this magic line: SET search_path = @extschema@; Then there would be no need for substitutions at all. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Mon, Dec 6, 2010 at 12:57 PM, Josh Berkus j...@agliodbs.com wrote: At some point Hackers should look at pg vs MySQL multi tenantry but it is way tangential today. My understanding is that our schemas work like MySQL databases; and our databases are an even higher level of isolation. No? That's correct. Drizzle is looking at implementing a feature like our databases called catalogs (per the SQL spec). Let me stress that not everyone is happy with the MySQL multi-tenantry approach. But it does make multi-tenancy on a scale which you seldom see with PG possible, even if it has problems. It's worth seeing whether we can steal any of their optimization ideas without breaking PG. Please make sure to articulate what you think is wrong with our existing model. I was specifically looking at the login model, which works around the issue that we have: namely that different login ROLEs can't share a connection pool. In MySQL, they can share the built-in connection pool because role-switching effectively is a session variable. AFAICT, anyway. Please explain more precisely what is wrong with SET SESSION AUTHORIZATION / SET ROLE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Mon, Dec 6, 2010 at 12:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. When would it be superior to replace(replace(orig, from1, to1), from2, to2), ... An iterated replacement has different semantics from a simultaneous replace - replacing N placeholders with values simultaneously means you don't need to worry about the case where one of the replacement strings contains something that looks like a placeholder. Good point, but what the patch implements is in fact iterated replacement ... or at least it looked that way in a quick once-over. Oh. Well, -1 from me for including that. I actually think a simultaneous replacement feature would be quite handy but I make no comment on whether it belongs as part of this patch. My point is that the replacement stuff really really needs to be factored out of the string-execution stuff, precisely because the desired behavior is debatable. +1 for committing the uncontroversial parts separately. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allow COPY routines to read arbitrary numbers of fields
On 12/06/2010 12:11 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: Attached is a patch that allows CopyReadAttibutesText() and CopyReadAttributesCSV() to read arbitrary numbers of attributes. Underflowing attributes are recorded as null, and space is made for overflowing attributes on a line. Why are you still passing nfields as a separate parameter instead of relying on the value you added to the struct? That can't do anything except cause confusion, especially once the two values diverge due to a previous array-expansion. Good point. will fix. Also, why did you change the setup code to not compute nfields in binary mode? That seems at best an unnecessary change, and at worst a breakage of the binary path --- did you test it? AFAICT it's not used in binary mode at all. But I will double check. Also please be a little more careful with the formatting. Ok, Will fix also. Thanks for he comments. cheers andre -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
2010/12/6 David E. Wheeler da...@kineticode.com: On Dec 6, 2010, at 4:06 AM, Itagaki Takahiro wrote: * contrib/citext raises an encoding error when COLLATE is specified even if it is the collation as same as the database default. We might need some special treatment for C locale. I've been wondering if this patch will support case-insensitve collations. If so, then citext should probably be revised to use one. what I know - no. It's support only system based collations Pavel Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Tom Lane t...@sss.pgh.pa.us wrote: I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? They're not of a very predictable size. Surely you can predict that any snapshot is no larger than a fairly small fixed portion plus sizeof(TransactionId) * MaxBackends? So, for example, if you're configured for 100 connections, you'd be limited to something under 1kB, maximum? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
Kevin Grittner kevin.gritt...@wicourts.gov writes: I reviewed the documentation and played around with this a bit and can't find any areas where the current PostgreSQL implementation of READ ONLY is incompatible with what is needed for the SSI optimizations where it is used. There are a large number of tests which exercise this, and they're all passing. Did you have something in particular in mind which I should check? I did not, just thought it was a point that merited examination. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allow COPY routines to read arbitrary numbers of fields
Andrew Dunstan and...@dunslane.net writes: On 12/06/2010 12:11 PM, Tom Lane wrote: Also, why did you change the setup code to not compute nfields in binary mode? That seems at best an unnecessary change, and at worst a breakage of the binary path --- did you test it? AFAICT it's not used in binary mode at all. But I will double check. Well, even if it is not used at the moment, it seems potentially of use in that path. So I'd vote for continuing to set it correctly, rather than making it deliberately incorrect as this patch is going out of its way to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? They're not of a very predictable size. Surely you can predict that any snapshot is no larger than a fairly small fixed portion plus sizeof(TransactionId) * MaxBackends? No. See subtransactions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Surely you can predict that any snapshot is no larger than a fairly small fixed portion plus sizeof(TransactionId) * MaxBackends? No. See subtransactions. Subtransactions are included in snapshots? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allow COPY routines to read arbitrary numbers of fields
On 12/06/2010 01:23 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 12/06/2010 12:11 PM, Tom Lane wrote: Also, why did you change the setup code to not compute nfields in binary mode? That seems at best an unnecessary change, and at worst a breakage of the binary path --- did you test it? AFAICT it's not used in binary mode at all. But I will double check. Well, even if it is not used at the moment, it seems potentially of use in that path. So I'd vote for continuing to set it correctly, rather than making it deliberately incorrect as this patch is going out of its way to do. Ok. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
David E. Wheeler da...@kineticode.com writes: On Dec 6, 2010, at 7:19 AM, Tom Lane wrote: On the whole I'd prefer not to have any substitution functionality hard-wired into pg_execute_file either, though I can see the argument that it's necessary for practical use. Basically I'm concerned that replace-equivalent behavior is not going to be satisfactory over the long run: I think eventually we're going to need to think about quoting/escaping behavior. So I think it's a bad idea to expose the assumption that it'll be done that way at the SQL level. +1 I suspect that, for the purposes of the extensions patch, if CREATE EXTENSION could be modified to handle setting the schema itself, without requiring that the file have this magic line: SET search_path = @extschema@; Then there would be no need for substitutions at all. That's an interesting idea, but I'm not sure it's wise to design around the assumption that we won't need substitutions ever. What I was thinking was that we should try to limit knowledge of the substitution behavior to the extension definition files and the implementation of CREATE EXTENSION itself. I don't agree with exposing that information at the SQL level. (On the other hand, if we *could* avoid using any explicit substitutions, it would certainly ease testing of extension files no? They'd be sourceable into psql then.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: No. See subtransactions. Subtransactions are included in snapshots? Sure, see GetSnapshotData(). You could avoid it by setting suboverflowed, but that comes at a nontrivial performance cost. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Dec 6, 2010, at 10:43 AM, Tom Lane wrote: That's an interesting idea, but I'm not sure it's wise to design around the assumption that we won't need substitutions ever. What I was thinking was that we should try to limit knowledge of the substitution behavior to the extension definition files and the implementation of CREATE EXTENSION itself. I don't agree with exposing that information at the SQL level. (On the other hand, if we *could* avoid using any explicit substitutions, it would certainly ease testing of extension files no? They'd be sourceable into psql then.) Yes. And extension authors would not have to remember to include the magic line (which at any rate would break extensions for earlier versions of PostgreSQL). Best, dAvid -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: No. See subtransactions. Subtransactions are included in snapshots? Sure, see GetSnapshotData(). You could avoid it by setting suboverflowed, but that comes at a nontrivial performance cost. Yeah, sorry for blurting like that before I checked. I was somewhat panicked that I'd missed something important for SSI, because my XidIsConcurrent check just uses xmin, xmax, and xip; I was afraid what I have would fall down in the face of subtransactions. But on review I found that I'd thought that through and (discussion in in the archives) I always wanted to associate the locks and conflicts with the top level transaction; so that was already identified before checking for overlap, and it was therefore more efficient to just check that. Sorry for the senior moment. :-/ Perhaps a line or two of comments about that in the SSI patch would be a good idea. And maybe some tests involving subtransactions -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
David E. Wheeler da...@kineticode.com writes: On Dec 6, 2010, at 10:43 AM, Tom Lane wrote: (On the other hand, if we *could* avoid using any explicit substitutions, it would certainly ease testing of extension files no? They'd be sourceable into psql then.) Yes. And extension authors would not have to remember to include the magic line (which at any rate would break extensions for earlier versions of PostgreSQL). Well, I don't put any stock in the idea that it's important for existing module .sql files to be usable as-is as extension definition files. If it happens to fall out that way, fine, but we shouldn't give up anything else to get that. Letting extension files be directly sourceable in psql is probably worth a bit more, but I'm not sure how much. The argument that forgetting to include a magic source_path command would make CREATE EXTENSION behave surprisingly seems to have a good deal of merit though, certainly enough to justify having CREATE EXTENSION take care of that internally if at all possible. The real question in my mind is whether there are any other known or foreseeable cases where we would need to have substitution capability and there's not another good way to handle it. I haven't been paying real close attention to the threads about this patch --- do we have any specific use-cases in mind for substitution, besides this one? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Dec 6, 2010, at 11:12 AM, Tom Lane wrote: Well, I don't put any stock in the idea that it's important for existing module .sql files to be usable as-is as extension definition files. If it happens to fall out that way, fine, but we shouldn't give up anything else to get that. I agree, but I don't think we have to lose anything. Letting extension files be directly sourceable in psql is probably worth a bit more, but I'm not sure how much. The argument that forgetting to include a magic source_path command would make CREATE EXTENSION behave surprisingly seems to have a good deal of merit though, certainly enough to justify having CREATE EXTENSION take care of that internally if at all possible. Yes. The other question I have, though, is how important is it to have extensions live in a particular schema since there seems to be no advantage to doing so. With the current patch, I can put extension foo in schema bar, but I can't put any other extension named foo in any other schema. It's in schema bar but is at the same time global. That doesn't make much sense to me. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On mån, 2010-12-06 at 21:06 +0900, Itagaki Takahiro wrote: On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut pete...@gmx.net wrote: Here is an updated patch to address the issues discussed during this commitfest. Here are comments and questions after I tested the latest patch: Issues * initdb itself seems to be succeeded, but it says could not determine encoding for locale messages for any combination of encoding=utf8/eucjp and locale=ja_JP.utf8/ja_JP.eucjp/C. Is it an expected behavior? creating collations ...initdb: locale name has non-ASCII characters, skipped: bokm虱 initdb: locale name has non-ASCII characters, skipped: fran軋is could not determine encoding for locale hy_AM.armscii8: codeset is ARMSCII-8 ... (a dozen of lines) ... could not determine encoding for locale vi_VN.tcvn: codeset is TCVN5712-1 ok What this does it take the output of locale -a and populate the pg_collation catalog with the locales it finds. When it finds an operating system locale that uses an encoding that is not recognized, you will see this warning. I understand that that would probably annoy users. We could hide the warning and silently skip those locales. But then could that hide genuine configuration problems? * contrib/citext raises an encoding error when COLLATE is specified even if it is the collation as same as the database default. We might need some special treatment for C locale. =# SHOW lc_collate; == C =# SELECT ('A'::citext) = ('a'::citext); == false =# SELECT ('A'::citext) = ('a'::citext) COLLATE C; ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. OK, I can reproduce that. That's fallout from the lc_ctype_is_c() optimization that I removed, as explained in another email. I'll have to think about that again. * pg_dump would generate unportable files for different platforms because collation names pg_dump can already produce unportable files for a number of other reasons, including per-database locale, tablespaces, OS-dependent configuration settings. The way I imagine this working is that someone who wants to design a genuinely portable application using this feature would create their own collation based on the existing, OS-specific collation (using a to-be-added CREATE COLLATION command). As mentioned earlier, however, we can't actually solve the problem that the OS locales may not behave the same across systems. Source codes * PG_GETARG_COLLATION might be a better name rather than PG_GET_COLLATION. It's not the collation of a function argument, it's the collation of a function call. (You could conceivably also fetch the collation of a function argument, but that isn't used in any way.) * What is the different between InvalidOid and DEFAULT_COLLATION_OID for collation oids? The patch replaces DirectFunctionCall to DirectFunctionCallC in some places, but we could shrink the diff size if we can use InvalidOid instead of DEFAULT_COLLATION_OID, Think of DEFAULT_COLLATION_OID as analogous to UKNOWNOID. A long time ago we used InvalidOid for all kinds of types, including unknown, pseudotypes, cstring, and no type at all. The reason we changed this was that this masked errors and made processing of the unknown type difficult/impossible. I know this makes the code bigger, but it's necessary. I originally coded the patch using InvalidOid for everything, but that wasn't very robust. This also ties into the next question ... * I still think an explicit passing collations from-function-to-function is horrible because we might forget it in some places, and almost existing third party module won't work. Is it possible to make it a global variable, and push/pop the state when changed? Sorry I'm missing something, but I think we could treat the collation setting as like as GUC settings. A collation is a property of a datum or an expression. You might as well argue that we don't keep track of types of expressions and instead store it globally. Doesn't make sense. Extensions are not required to support collations. Those that might want to will usually end up calling one of the locale-enabled functions such as varstr_cmp(), and there the function prototype will ensure that specifying a collation cannot be missed. Additionally, the distinction of InvalidOid and DEFAULT_COLLATION_OID does a great deal to ensure that in case a collation is unspecified or missing in some new code, you will get a proper error message instead of unspecified behavior. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On mån, 2010-12-06 at 10:01 -0800, David E. Wheeler wrote: I've been wondering if this patch will support case-insensitve collations. If so, then citext should probably be revised to use one. This has been touch upon several times during the discussions on past patches. Essentially, the current patch only arranges that you can specify a sort order for data. The system always breaks ties using a binary comparison. This could conceivably be changed, but it's a separate problem. Some of the necessary investigation work has presumably already been done in the context of citext. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
David E. Wheeler da...@kineticode.com writes: The other question I have, though, is how important is it to have extensions live in a particular schema since there seems to be no advantage to doing so. With the current patch, I can put extension foo in schema bar, but I can't put any other extension named foo in any other schema. It's in schema bar but is at the same time global. That doesn't make much sense to me. There's a difference between whether an extension as such is considered to belong to a schema and whether its contained objects do. We can't really avoid the fact that functions, operators, etc must be assigned to some particular schema. It seems not particularly important that extension names be schema-qualified, though --- the use-case for having two different extensions named foo installed simultaneously seems pretty darn small. On the other hand, if we were enforcing that all objects contained in an extension belong to the same schema, it'd make logistical sense to consider that the extension itself belongs to that schema as well. But last I heard we didn't want to enforce such a restriction. I believe what the search_path substitution is actually about is to provide a convenient shorthand for the case that all the contained objects do indeed live in one schema, and you'd like to be able to select that schema at CREATE EXTENSION time. Which seems like a useful feature for a common case. We've certainly heard multiple complaints about the fact that you can't do that easily now. BTW, I did think of a case where substitution solves a problem we don't presently have any other solution for: referring to the target schema within the definition of a contained object. As an example, you might wish to attach SET search_path = @target_schema@ to the definition of a SQL function in an extension, to prevent search-path-related security issues in the use of the function. Without substitution you'll be reduced to hard-wiring the name of the target schema. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On Dec 6, 2010, at 11:29 AM, Peter Eisentraut wrote: This has been touch upon several times during the discussions on past patches. Essentially, the current patch only arranges that you can specify a sort order for data. The system always breaks ties using a binary comparison. This could conceivably be changed, but it's a separate problem. Some of the necessary investigation work has presumably already been done in the context of citext. Okay, thanks, good to know. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Label switcher function
2010/11/25 KaiGai Kohei kai...@ak.jp.nec.com: The attached patch is a revised one. It provides two hooks; the one informs core PG whether the supplied function needs to be hooked, or not. the other is an actual hook on prepare, start, end and abort of function invocations. typedef bool (*needs_function_call_type)(Oid fn_oid); typedef void (*function_call_type)(FunctionCallEventType event, FmgrInfo *flinfo, Datum *private); The hook prototype was a bit modified since the suggestion from Robert. Because FmgrInfo structure contain OID of the function, it might be redundant to deliver OID of the function individually. Rest of parts are revised according to the comment. I also fixed up source code comments which might become incorrect. FCET_PREPARE looks completely unnecessary to me. Any necessary one-time work can easily be done at FCET_START time, assuming that the private-data field is initialized to (Datum) 0. I'm fairly certain that the following is not portable: + ObjectAddress object = { .classId = ProcedureRelationId, + .objectId = fn_oid, + .objectSubId = 0 }; I'd suggest renaming needs_function_call_type and function_call_type to needs_fmgr_hook_type and fmgr_hook_type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
Please explain more precisely what is wrong with SET SESSION AUTHORIZATION / SET ROLE. 1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't had any time to work on) 2) Users can always issue their own SET ROLE and then hack into other users' data. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
Please It would be very important to us that the Brazilian LIKE collate worked with, and possible case-insensitive and accent-insensitive Tank's Alexandre Riveira Brazil Peter Eisentraut escreveu: On mån, 2010-12-06 at 10:01 -0800, David E. Wheeler wrote: I've been wondering if this patch will support case-insensitve collations. If so, then citext should probably be revised to use one. This has been touch upon several times during the discussions on past patches. Essentially, the current patch only arranges that you can specify a sort order for data. The system always breaks ties using a binary comparison. This could conceivably be changed, but it's a separate problem. Some of the necessary investigation work has presumably already been done in the context of citext. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_execute_from_file review
On Dec 6, 2010, at 11:36 AM, Tom Lane wrote: There's a difference between whether an extension as such is considered to belong to a schema and whether its contained objects do. We can't really avoid the fact that functions, operators, etc must be assigned to some particular schema. Right, of course. It seems not particularly important that extension names be schema-qualified, though --- the use-case for having two different extensions named foo installed simultaneously seems pretty darn small. On the other hand, if we were enforcing that all objects contained in an extension belong to the same schema, it'd make logistical sense to consider that the extension itself belongs to that schema as well. But last I heard we didn't want to enforce such a restriction. Okay. I believe what the search_path substitution is actually about is to provide a convenient shorthand for the case that all the contained objects do indeed live in one schema, and you'd like to be able to select that schema at CREATE EXTENSION time. Which seems like a useful feature for a common case. We've certainly heard multiple complaints about the fact that you can't do that easily now. Yes, it *is* useful. But what happens if I have SET search_path = whatever; In my extension install script, and someone executes CREATE EXTENSION FOO WITH SCHEMA bar; Surprise! Everything is in whatever, not in bar. BTW, I did think of a case where substitution solves a problem we don't presently have any other solution for: referring to the target schema within the definition of a contained object. As an example, you might wish to attach SET search_path = @target_schema@ to the definition of a SQL function in an extension, to prevent search-path-related security issues in the use of the function. Without substitution you'll be reduced to hard-wiring the name of the target schema. You lost me. :-( David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
On Dec5, 2010, at 16:11 , Kevin Grittner wrote: The simple way to implement SERIALIZABLE READ ONLY DEFERRABLE under SSI would be to have each non-read-only serializable transaction acquire a heavyweight lock which can coexist with other locks at the same level (SHARE looks good) on some common object and hold that for the duration of the transaction, while a SERIALIZABLE READ ONLY DEFERRABLE transaction would need to acquire a conflicting lock (EXCLUSIVE looks good) before it could acquire a snapshot, and release the lock immediately after acquiring the snapshot. Hm, so once a SERIALIZABLE READ ONLY DEFERRABLE is waiting to acquire the lock, no other transaction would be allowed to start until the SERIALIZABLE READ ONLY DEFERRABLE transaction has been able to acquire its snapshot. For pg_dump's purposes at least, that seems undesirable, since a single long-running transaction at the time you start pg_dump would effectly DoS your system until the long-running transaction finishes. The alternative seems to be to drop the guarantee that a SERIALIZABLE READ ONLY DEFERRABLE won't be starved forever by a stream of overlapping non-READ ONLY transactions. Then a flag in the proc array that marks non-READ ONLY transactions should be sufficient, plus a wait-and-retry loop to take snapshots for SERIALIZABLE READ ONLY DEFERRABLE transactions. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: IIRC, in old discussions of this problem we first considered allowing clients to pull down an explicit representation of their snapshot (which actually is an existing feature now, txid_current_snapshot()) and then upload that again to become the active snapshot in another connection. Could a hot standby use such a snapshot representation? I.e. same snapshot on the master and the standby? Greetings Marcin Mańk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Greg Smith g...@2ndquadrant.com writes: Regardless, I'm now leaning heavily toward the idea of avoiding open_datasync by default given this bug, and backpatching that change to at least 8.4. I'll do some more database-level performance tests here just as a final sanity check on that. My gut feel is now that we'll eventually be taking something like Marti's patch, adding some more documentation around it, and applying that to HEAD as well as some number of back branches. I think we have got consensus that (1) open_datasync should not be the default on Linux, and (2) this change needs to be back-patched. What is not clear to me is whether we have consensus to change the option preference order globally, or restrict the change to just be effective on Linux. The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. However, it seems reasonable to me to suppose that open_datasync could only be a win in very restricted scenarios and thus shouldn't be a preferred default. Also, I dread trying to document the behavior if the preference order becomes platform-dependent. With the holidays fast approaching, our window to do something about this in a timely fashion grows short. If we don't schedule update releases to be made this week, I think we're looking at not getting the updates out till after New Year's. Do we want to wait that long? Is anyone actually planning to do performance testing that would prove anything about non-Linux platforms? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
On 06.12.2010 21:48, marcin mank wrote: On Sun, Dec 5, 2010 at 7:28 PM, Tom Lanet...@sss.pgh.pa.us wrote: IIRC, in old discussions of this problem we first considered allowing clients to pull down an explicit representation of their snapshot (which actually is an existing feature now, txid_current_snapshot()) and then upload that again to become the active snapshot in another connection. Could a hot standby use such a snapshot representation? I.e. same snapshot on the master and the standby? Hmm, I suppose it could. That's an interesting idea, you could run parallel pg_dump or something else against master and/or multiple hot standby servers, all working on the same snapshot. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
marcin mank marcin.m...@gmail.com writes: On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: IIRC, in old discussions of this problem we first considered allowing clients to pull down an explicit representation of their snapshot (which actually is an existing feature now, txid_current_snapshot()) and then upload that again to become the active snapshot in another connection. Could a hot standby use such a snapshot representation? I.e. same snapshot on the master and the standby? Hm, that's a good question. It seems like it's at least possibly workable, but I'm not sure if there are any showstoppers. The other proposal of publish-a-snapshot would presumably NOT support this, since we'd not want to ship the snapshot temp files down the WAL stream. However, if you were doing something like parallel pg_dump you could just run the parent and child instances all against the slave, so the pg_dump scenario doesn't seem to offer much of a supporting use-case for worrying about this. When would you really need to be able to do it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
Florian Pflug f...@phlo.org wrote: On Dec5, 2010, at 16:11 , Kevin Grittner wrote: The simple way to implement SERIALIZABLE READ ONLY DEFERRABLE under SSI would be to have each non-read-only serializable transaction acquire a heavyweight lock which can coexist with other locks at the same level (SHARE looks good) on some common object and hold that for the duration of the transaction, while a SERIALIZABLE READ ONLY DEFERRABLE transaction would need to acquire a conflicting lock (EXCLUSIVE looks good) before it could acquire a snapshot, and release the lock immediately after acquiring the snapshot. Hm, so once a SERIALIZABLE READ ONLY DEFERRABLE is waiting to acquire the lock, no other transaction would be allowed to start until the SERIALIZABLE READ ONLY DEFERRABLE transaction has been able to acquire its snapshot. For pg_dump's purposes at least, that seems undesirable, since a single long-running transaction at the time you start pg_dump would effectly DoS your system until the long-running transaction finishes. Well, when you put it that way, it sounds pretty grim. :-( Since one of the bragging points of SSI is that it doesn't introduce any blocking beyond current snapshot isolation, I don't want to do something here which blocks anything except the transaction which has explicitly requested the DEFERRABLE property. I guess that, simple as that technique might be, it just isn't a good idea. The alternative seems to be to drop the guarantee that a SERIALIZABLE READ ONLY DEFERRABLE won't be starved forever by a stream of overlapping non-READ ONLY transactions. Then a flag in the proc array that marks non-READ ONLY transactions should be sufficient, plus a wait-and-retry loop to take snapshots for SERIALIZABLE READ ONLY DEFERRABLE transactions. If I can find a way to pause an active process I already have functions in which I maintain the count of active SERIALIZABLE READ WRITE transactions as they begin and end -- I could release pending DEFERRABLE transactions when the count hits zero without any separate loop. That has the added attraction of being a path to the more complex checking which could allow the deferrable process to start sooner in some circumstances. The simple solution with the heavyweight lock would not have been a good path to that. What would be the correct way for a process to put itself to sleep, and for another process to later wake it up? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
On 06.12.2010 22:53, Kevin Grittner wrote: What would be the correct way for a process to put itself to sleep, and for another process to later wake it up? See ProcWaitSignal/ProcSendSignal. Or the new 'latch' code. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] serializable read only deferrable
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.12.2010 22:53, Kevin Grittner wrote: What would be the correct way for a process to put itself to sleep, and for another process to later wake it up? See ProcWaitSignal/ProcSendSignal. Or the new 'latch' code. Is there a reason to prefer one over the other? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for parallel pg_dump
However, if you were doing something like parallel pg_dump you could just run the parent and child instances all against the slave, so the pg_dump scenario doesn't seem to offer much of a supporting use-case for worrying about this. When would you really need to be able to do it? If you had several standbys, you could distribute the work of the pg_dump among them. This would be a huge speedup for a large database, potentially, thanks to parallelization of I/O and network. Imagine doing a pg_dump of a 300GB database in 10min. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: Linux: don't want O_DIRECT by default for reliability reasons, and there's no clear performance win in the default config with small wal_buffers Solaris: O_DIRECT doesn't work, there's another API support has never been added for; see http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and Windows: Small reported gains for O_DIRECT, i.e 10% at http://archives.postgresql.org/pgsql-hackers/2007-03/msg01615.php FreeBSD: It probably works there, but I've never seen good performance tests of it on this platform. Mac OS X: Like Solaris, there's a similar mechanism but it's not O_DIRECT; see http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag for notes about the F_NOCACHE feature used. Same basic situation as Solaris; there's an API, but PostgreSQL doesn't use it yet. So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
[HACKERS] the number of file descriptors when using POSIX semaphore
Hi, folks, in src/template/darwin: # Select appropriate semaphore support. Darwin 6.0 (Mac OS X 10.2) and up # support System V semaphores; before that we have to use POSIX semaphores, # which are less good for our purposes because they eat a file descriptor # per backend per max_connection slot. To my understanding, the number of descriptors created by POSIX semaphores would be # of actual clients times max_connection. However, I monitor the number of open files using sysctl, and I find that kern.num_files doesn't match the result calculated by the formula that is inferred by me... So, what would the number of file descriptors be, when using POSIX semaphore? Thanks!
Re: [HACKERS] WIP patch for parallel pg_dump
Josh Berkus j...@agliodbs.com writes: However, if you were doing something like parallel pg_dump you could just run the parent and child instances all against the slave, so the pg_dump scenario doesn't seem to offer much of a supporting use-case for worrying about this. When would you really need to be able to do it? If you had several standbys, you could distribute the work of the pg_dump among them. This would be a huge speedup for a large database, potentially, thanks to parallelization of I/O and network. Imagine doing a pg_dump of a 300GB database in 10min. That does sound kind of attractive. But to do that I think we'd have to go with the pass-the-snapshot-through-the-client approach. Shipping internal snapshot files through the WAL stream doesn't seem attractive to me. While I see Robert's point about preferring not to expose the snapshot contents to clients, I don't think it outweighs all other considerations here; and every other one is pointing to doing it the other way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
On 12/5/10 2:12 PM, Greg Smith wrote: Josh Berkus wrote: I modified test_fsync in two ways to run this; first, to make it support O_DIRECT, and second to make it run in the *current* directory. Patch please? I agree with the latter change; what test_fsync does is surprising. Attached. Making it support O_DIRECT would be possible but more complex; I don't see the point unless we think we're going to have open_sync_with_odirect as a seperate option. I suggested a while ago that we refactor test_fsync to use a common set of source code as the database itself for detecting things related to wal_sync_method, perhaps just extract that whole set of DEFINE macro logic to somewhere else. That happened at a bad time in the development cycle (right before a freeze) and nobody ever got back to the idea afterwards. If this code is getting touched, and it's clear it is in some direction, I'd like to see things change so it's not possible for the two to diverge again afterwards. I don't quite follow you. Maybe nobody else did last time, either. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com diff --git a/src/tools/fsync/test_fsync.c b/src/tools/fsync/test_fsync.c index 28c2119..12a83e1 100644 *** a/src/tools/fsync/test_fsync.c --- b/src/tools/fsync/test_fsync.c *** *** 23,34 #include string.h ! #ifdef WIN32 #define FSYNC_FILENAME ./test_fsync.out - #else - /* /tmp might be a memory file system */ - #define FSYNC_FILENAME /var/tmp/test_fsync.out - #endif #define WRITE_SIZE (8 * 1024) /* 8k */ --- 23,32 #include string.h ! /* put the temp files in the local directory !this is a change from older versions which used !/var/tmp */ #define FSYNC_FILENAME ./test_fsync.out #define WRITE_SIZE (8 * 1024) /* 8k */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 10-12-06 06:56 PM, Greg Smith wrote: Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: snip So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. -- Greg Smith 2ndQuadrant usg...@2ndquadrant.comBaltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Greg Smith g...@2ndquadrant.com writes: So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. To my mind, O_DIRECT is not really the key issue here, it's whether to prefer O_DSYNC or fdatasync. I looked back in the archives, and I think that the main reason we prefer O_DSYNC when available is the results I got here: http://archives.postgresql.org/pgsql-hackers/2001-03/msg00381.php which demonstrated a performance benefit on HPUX 10.20, though with a test tool much more primitive than test_fsync. I still have that machine, although the disk that was in it at the time died awhile back. What's in there now is a Seagate ST336607LW spinning at 1 RPM (166 rev/sec) and today I get numbers like this from test_fsync: Simple write: 8k write 28331.020/second Compare file sync methods using one write: open_datasync 8k write 161.190/second open_sync 8k write 156.478/second 8k write, fdatasync 54.302/second 8k write, fsync 51.810/second Compare file sync methods using two writes: 2 open_datasync 8k writes81.702/second 2 open_sync 8k writes80.172/second 8k write, 8k write, fdatasync40.829/second 8k write, 8k write, fsync39.836/second Compare open_sync with different sizes: open_sync 16k write 80.192/second 2 open_sync 8k writes78.018/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 52.527/second 8k write, close, fsync 54.092/second So *on that rather ancient platform* there's a measurable performance benefit to O_DSYNC, but this seems to be largely because fdatasync is stubbed to fsync in userspace rather than because fdatasync wouldn't be a better idea in the abstract. Also, a lot of the argument against fsync at the time was that it forced the kernel to iterate through all the buffers for the WAL file to see if any were dirty. I would imagine that modern kernels are a tad smarter about that; and even if they aren't, the CPU speed versus disk speed tradeoff has changed enough since 2001 that iterating through 16MB of buffers isn't as interesting as it was then. So to my mind, switching to the preference order fdatasync, fsync_writethrough, fsync seems like the thing to do. Since we assume fsync is always available, that means that O_DSYNC/O_SYNC will not be the defaults on any platform. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Josh Berkus j...@agliodbs.com writes: Making it support O_DIRECT would be possible but more complex; I don't see the point unless we think we're going to have open_sync_with_odirect as a seperate option. Whether it's complex or not isn't really the issue. The issue is that what test_fsync is testing had better match what the backend does, or people will be making choices based on not-comparable test results. I think we should have test_fsync just automatically fold in O_DIRECT the same way the backend does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
On 12/06/2010 08:38 PM, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: Making it support O_DIRECT would be possible but more complex; I don't see the point unless we think we're going to have open_sync_with_odirect as a seperate option. Whether it's complex or not isn't really the issue. The issue is that what test_fsync is testing had better match what the backend does, or people will be making choices based on not-comparable test results. I think we should have test_fsync just automatically fold in O_DIRECT the same way the backend does. Indeed. We were quite confused for a while when we were dealing with this about a week ago, and my handwritten test program failed as expected but test_fsync didn't. Anything other than behaving just as the backend does violates POLA, in my view. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Steve, If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. Compile and run test_fsync in PGSRC/src/tools/fsync. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
On Mon, Dec 6, 2010 at 11:54 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, that's actually a quite different timeout than what's required for synchronous replication. In synchronous replication, you need to get an acknowledgment within a timeout. This patch only puts a timeout on how long we wait to have enough room in the TCP send buffer. That doesn't seem all that useful. Yeah, I'm planning to implement that timeout for synchronous replication later. Since I thought that we should implement the timeout for *asynchronous* replication first and then extend it for synchronous replication, I created this patch. This kind of timeout is required for asynchronous replication since since there is no acknowledgement from the standby in it. Most part of the patch implements the non-blocking send function and changes walsender so that it uses that function instead of existing blocking one. This will be infrastructure for the timeout for synchronous replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Mac OS X: Like Solaris, there's a similar mechanism but it's not O_DIRECT; see http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag for notes about the F_NOCACHE feature used. Same basic situation as Solaris; there's an API, but PostgreSQL doesn't use it yet. Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than open_sync. Results from a MacBook: Sidney-Stratton:fsync josh$ ./test_fsync Loops = 1 Simple write: 8k write 2121.004/second Compare file sync methods using one write: (open_datasync unavailable) open_sync 8k write 1993.833/second (fdatasync unavailable) 8k write, fsync1878.154/second Compare file sync methods using two writes: (open_datasync unavailable) 2 open_sync 8k writes 1005.009/second (fdatasync unavailable) 8k write, 8k write, fsync 1709.862/second Compare open_sync with different sizes: open_sync 16k write1728.803/second 2 open_sync 8k writes 969.416/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1772.572/second 8k write, close, fsync 1939.897/second -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Whether it's complex or not isn't really the issue. The issue is that what test_fsync is testing had better match what the backend does, or people will be making choices based on not-comparable test results. I think we should have test_fsync just automatically fold in O_DIRECT the same way the backend does. OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails. What should I have it do instead? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Mac OS X: Like Solaris, there's a similar mechanism but it's not O_DIRECT; see http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag for notes about the F_NOCACHE feature used. Same basic situation as Solaris; there's an API, but PostgreSQL doesn't use it yet. Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than open_sync. But I think you need to use fsync_writethrough if you actually want durability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Mon, Dec 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote: Please explain more precisely what is wrong with SET SESSION AUTHORIZATION / SET ROLE. 1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't had any time to work on) 2) Users can always issue their own SET ROLE and then hack into other users' data. Makes sense. It would be nice to fix those issues, independent of anything else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than open_sync. But I think you need to use fsync_writethrough if you actually want durability. Yeah. Unless your laptop contains an SSD, those numbers are garbage on their face. So that's another problem with test_fsync: it omits fsync_writethrough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 12/6/10 6:10 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than open_sync. But I think you need to use fsync_writethrough if you actually want durability. Yeah. Unless your laptop contains an SSD, those numbers are garbage on their face. So that's another problem with test_fsync: it omits fsync_writethrough. Yeah, the issue with test_fsync appears to be that it's designed to work without os-specific switches no matter what, not to accurately reflect how we access wal. I'll see if I can do better. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Josh Berkus j...@agliodbs.com writes: OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails. What should I have it do instead? Report that it fails, and keep testing the other methods. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Excerpts from Greg Smith's message of dom dic 05 20:02:48 -0300 2010: When ends up happening if you push toward fully sync I/O is the design you see in some other databases, where you need multiple writer processes. Then requests for new pages can continue to allocate as needed, while keeping any one write from blocking things. That's one sort of a way to simulate asynchronous I/O, and you can substitute true async I/O instead in many of those implementations. We didn't have much luck with portability on async I/O when that was last experimented with, and having multiple background writer processes seems like overkill; that whole direction worries me. Why would multiple bgwriter processes worry you? Of course, it wouldn't work to have multiple processes trying to execute a checkpoint simultaneously, but what if we separated the tasks so that one process is in charge of checkpoints, and another oneZis in charge of the LRU scan? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
All, Geirth's results from his FreeBSD 7.1 server using 8.4's test_fsync: Simple write timing: write0.007081 Compare fsync times on write() and non-write() descriptor: If the times are similar, fsync() can sync data written on a different descriptor. write, fsync, close 5.937933 write, close, fsync 8.056394 Compare one o_sync write to two: one 16k o_sync write 7.366927 two 8k o_sync writes15.299300 Compare file sync methods with one 8k write: (o_dsync unavailable) open o_sync, write 7.512682 (fdatasync unavailable) write, fsync 5.856480 Compare file sync methods with two 8k writes: (o_dsync unavailable) open o_sync, write 15.472910 (fdatasync unavailable) write, fsync 5.880319 ... again, open_sync does not look very impressive. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
Excerpts from Robert Haas's message of lun dic 06 23:09:56 -0300 2010: On Mon, Dec 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote: Please explain more precisely what is wrong with SET SESSION AUTHORIZATION / SET ROLE. 1) Session GUCS do not change with a SET ROLE (this is a TODO I haven't had any time to work on) 2) Users can always issue their own SET ROLE and then hack into other users' data. Makes sense. It would be nice to fix those issues, independent of anything else. It seems plausible to fix the first one, but how would you fix the second one? You either allow SET ROLE (which you need, to support the pooler changing authorization), or you don't. There doesn't seem to be a usable middleground. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_sender_delay is still required?
On Tue, Dec 7, 2010 at 12:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: Walsender doesn't need the periodic wakeups anymore, thanks to the latch feature. So wal_sender_delay is basically useless now. How about dropping wal_sender_delay or increasing the default value? If we don't need it, we should remove it. The attached patch removes wal_sender_delay and uses hard-coded 10 seconds instead of wal_sender_delay as the delay between activity rounds for walsender. One problem with the patch is that it takes longer (at most 10s) to detect the unexpected death of postmaster (by calling PostmasterIsAlive()). This is OK for me. But does anyone want to specify the delay to detect that within a short time? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center drop_wal_sender_delay_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut pete...@gmx.net wrote: Here is an updated patch to address the issues discussed during this commitfest. I found another issue in the patch; ILIKE in WHERE clause doesn't work. It was surprising because LIKE in WHERE clause and ILIKE in SELECT list works expectedly. - SELECT * FROM pg_class WHERE relname LIKE 'pg%' - SELECT relname ILIKE 'pg%' FROM pg_class; postgres=# SELECT name, setting FROM pg_settings WHERE name IN ('lc_ctype', 'lc_collate', 'server_encoding'); name | setting -+- lc_collate | C lc_ctype| C server_encoding | UTF8 (3 rows) postgres=# SELECT * FROM pg_class WHERE relname ILIKE 'pg%'; ERROR: no collation was derived -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers