Re: [HACKERS] How to determine failed connection attempt due to invalid authorization (libpq)?
On Mon, Oct 18, 2010 at 07:26:32AM -0700, David Fetter wrote: On Mon, Oct 18, 2010 at 10:18:25AM -0400, Robert Haas wrote: On Mon, Oct 18, 2010 at 10:05 AM, David Fetter da...@fetter.org wrote: On Mon, Oct 18, 2010 at 10:03:55AM -0400, Robert Haas wrote: On Sun, Oct 17, 2010 at 2:03 AM, Dmitriy Igrishin dmit...@gmail.com wrote: I've asked pgsql-general. Unfortunately it seems that there is no better way to do it except parsing PQerrorMessage(). Sadly. Yeah, doesn't look like it. A quick glance at the code reveals that a PGresult can store individual error fields but a PGconn can store only a message. :-( Does this seem worth patching for 9.1? Please feel free to submit a patch if you have an idea how to solve it. Will look that over this evening and submit an idea for a patch :) I'm pretty slammed by some other commitments at the moment. Probably won't get a chance to do this until at least this weekend. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote: On 10/18/2010 10:52 AM, Tom Lane wrote: We could possibly deal with enum types that follow the existing convention if we made the cache entry hold a list of all the original, known-to-be-sorted OIDs. (This could be reasonably compact and cheap to probe if it were represented as a starting OID and a Bitmapset of delta values, since we can assume that the initial set of OIDs is pretty close together.) But we have to have that cache entry, and we have to consult it on every single comparison, so it's definitely going to be slower than before. So I'm thinking the comparison procedure goes like this: 1. Both OIDs even? If so, just compare them numerically, and we're done. 2. Lookup cache entry for enum type. 3. Both OIDs in list of known-sorted OIDs? If so, just compare them numerically, and we're done. 4. Search the part of the cache entry that lists sort positions. If not both present, refresh the cache entry. If still not present, throw error. 5. Compare by sort positions. Step 4 is the slowest part but would be avoided in most cases. However, step 2 is none too speedy either, and would usually be required when dealing with pre-existing enums. OK, I've made adjustments that I think do what you're suggesting. Patch is attached. Ah, I'd missed the point about the bitmapset. In the most common case, most of the enum elements are probably going to be in the right order, so you save a lot by identifying that case quickly. I didn't have time to play with hash maps myself, but I don't think it will make much difference now because hopefully the binary search isn't going to be hit a lot anyway. There are a couple of things that look odd about this code though (I haven't tested it, but they look wrong): In the loop identifying the longest range: for (list_end = start_pos+1; list_end num; list_end++) if (mycache-sort_order_list[list_end].sort_order mycache-sort_order_list[list_end - 1].sort_order || mycache-sort_order_list[list_end].enum_oid - mycache-sort_order_list[list_end].enum_oid = BITMAPSIZE) That last test isn't doing anything. Shouldn't the second list_end should be start_pos? In the loop that sets bits in the bitmap, it looks like it's assuming the range starts at 0. I haven't had any caffeine yet, so maybe I'm misunderstanding it, but I can't see anywhere that sets bitmap_base. Regards, Dean Alternatively this can be pulled from g...@github.com:adunstan/postgresql-dev.git cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote: On 10/18/2010 10:52 AM, Tom Lane wrote: We could possibly deal with enum types that follow the existing convention if we made the cache entry hold a list of all the original, known-to-be-sorted OIDs. (This could be reasonably compact and cheap to probe if it were represented as a starting OID and a Bitmapset of delta values, since we can assume that the initial set of OIDs is pretty close together.) But we have to have that cache entry, and we have to consult it on every single comparison, so it's definitely going to be slower than before. So I'm thinking the comparison procedure goes like this: 1. Both OIDs even? If so, just compare them numerically, and we're done. 2. Lookup cache entry for enum type. 3. Both OIDs in list of known-sorted OIDs? If so, just compare them numerically, and we're done. 4. Search the part of the cache entry that lists sort positions. If not both present, refresh the cache entry. If still not present, throw error. 5. Compare by sort positions. Step 4 is the slowest part but would be avoided in most cases. However, step 2 is none too speedy either, and would usually be required when dealing with pre-existing enums. OK, I've made adjustments that I think do what you're suggesting. Patch is attached. Alternatively this can be pulled from g...@github.com:adunstan/postgresql-dev.git Andrew, can't you get your own repo at git.postgresql.org? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Tue, Oct 19, 2010 at 10:19, Thom Brown t...@linux.com wrote: On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote: On 10/18/2010 10:52 AM, Tom Lane wrote: We could possibly deal with enum types that follow the existing convention if we made the cache entry hold a list of all the original, known-to-be-sorted OIDs. (This could be reasonably compact and cheap to probe if it were represented as a starting OID and a Bitmapset of delta values, since we can assume that the initial set of OIDs is pretty close together.) But we have to have that cache entry, and we have to consult it on every single comparison, so it's definitely going to be slower than before. So I'm thinking the comparison procedure goes like this: 1. Both OIDs even? If so, just compare them numerically, and we're done. 2. Lookup cache entry for enum type. 3. Both OIDs in list of known-sorted OIDs? If so, just compare them numerically, and we're done. 4. Search the part of the cache entry that lists sort positions. If not both present, refresh the cache entry. If still not present, throw error. 5. Compare by sort positions. Step 4 is the slowest part but would be avoided in most cases. However, step 2 is none too speedy either, and would usually be required when dealing with pre-existing enums. OK, I've made adjustments that I think do what you're suggesting. Patch is attached. Alternatively this can be pulled from g...@github.com:adunstan/postgresql-dev.git Andrew, can't you get your own repo at git.postgresql.org? He certainly could, but github provides more features and a nicer look :-) And since it's git, it doesn't matter where the repo is. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
(2010/10/14 1:52), Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us wrote: That's all true, but you have to consider how much the obstacle actually gets in their way versus how painful it is on your end to create and maintain the obstacle. �I don't think this proposed patch measures up very well on either end of that tradeoff. I think it would behoove us to try to separate concerns about this particular patch from concerns about the viability of the whole approach. Whether or not it's useful to do X is a different question than whether it can be done with few enough lines of code and/or whether this patch actually does it well. I think I left the wrong impression: I'm concerned about the whole approach. I haven't even read this particular patch lately. I think that trying to guarantee that the planner applies independent constraints in a particular order will be expensive, fragile, and prone to recurring security bugs no matter how it's implemented --- unless you do it by lobotomizing query pullup/pushdown, which seems unacceptable from a performance standpoint. Just to give one example of what this patch misses (probably; as I said I haven't read it lately), what about selectivity estimation? One of the things we like to do when we have an otherwise-unknown function is to try it on all the histogram members and see what percentage yield true. That might already represent enough of an information leak to be objectionable ... and yet, if we don't do it, the consequences for performance could be pretty horrid because we'll have to work without any reasonable selectivity estimate at all. There are cases where this technique isn't applied at the moment but probably should be, which is why I characterize the leak-prevention idea as creating future security issues: doing that is a constraint that will have to be accounted for in every future planner change, and we are certain to miss the implications sometimes. Sorry, I might misread what you pointed out. Do you see oprrest/oprjoin being internally invoked as a problem? Well, I also think it is a problem, as long as they can be installed by non-superusers. But, it is a separated problem from the row-level security issues. In my opinion, origin of the matter is incorrect checks on creation of operators. It allows non-superusers to define a new operator with restriction and join estimator as long as he has EXECUTE privilege on the functions. (not EXECUTE privilege of actual user who invokes this function on estimation phase!) Then, the optimizer may internally invoke these functions without any privilege checks on either the function or the table to be estimated. If a malicious one tries to make other users to invoke a trojan-horse, he can define a trappy operator with malicious estimator functions, cannot it? I think it is a situation that we should check superuser privilege which means the specified functions have no malicious intention and equivalent to the privilege to grant 'EXECUTE' to public. Here is similar problem at conversion functions. If malicious one want to install a fake conversion function that records all the stream between server and client, it seems to me not impossible. Well, I'd like to suggest to revise the specifications of permission checks on these commands. If we can ensure these functions are not malicious, no need to care about information leaks via untrusted functions internally used. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Hi, Josh Berkus j...@agliodbs.com writes: It is critical that we make replication easier to set up, administrate and monitor than it currently is. In my conversations with people, this is more important to our users and the adoption of PostgreSQL than synchronous replication is. I want to say a big big +1 here. The way replication and PITR setup are implemented now are a very good prototype, it's time to consolidate and get to something usable by normal people, as opposed to PostgreSQL full time geeks. Well, the current setup offers lots of flexibility which we'd better not lose in the process, but the simple setup simply does not exists yet. 1. Any postgresql standalone server can become a replication master simply by enabling replication connections in pg_hba.conf. No other configuration is required, and no server restart is required. That sounds as simple as changing the default wal_level to hot_standby, and the default max_wal_senders to non-zero. 2. Should I choose to adjust master configuration, for say performance reasons, most replication variables (including ones like wal_keep_segments) should be changeable without a server restart. Anybody know how difficult that is without having to spend lots of time studying the source code with the question in mind? 3. I can configure a standby by copying the same postgresql.conf on the master. I only have to change a single configuration variable (the primary_conninfo, or maybe a replication_mode setting) in order to start the server in standby mode. GUCs which apply only to masters are ignored. 4. I can start a new replica off the master by running a single command-line utility on the standby and giving it connection information to the master. Using this connection, it should be able to start a backup snapshot, copy the entire database and any required logs, and then come up in standby mode. All that should be required for this is one or two highport connections to the master. No recovery.conf file is required, or exists. There's a prototype to stream a base backup from a libpq connection, I think someone here wanted to integrate that into the replication protocol itself. It should be doable with a simple libpq connection and all automated. The pg_basebackup python client software is 100 lines of code. It's mainly a recursive query to get the list of files within the master, then two server side functions to get binary file chunks, compressed. Then client side, a loop to decompress and write the chunks at the right place. That's it. http://github.com/dimitri/pg_basebackup/blob/master/pg_basebackup.py I could prepare a patch given some advice on the replication protocol integration. For one, is streaming a base backup something that walsender should care about? 5. I can to cause the standby to fail over with a single command to the failover server. If this is a trigger file, then it already has a default path to the trigger file in postgresql.conf, so that this does not require reconfiguration and restart of the standby at crisis time. Ideally, I use a pg_failover command or something similar. This feature is in walmgr.py from Skytools and it's something necessary to have in -core now that we have failover standby capacity. Much agreed, and the pg_failover command is a good idea. BTW, do we have a clear idea of how to implement pg_ping, and should it reports current WAL location(s) of a standby? 6. Should I decide to make the standby the new master, this should also be possible with a single command and a one-line configuration on the other standbys. To aid this, we have an easy way to tell which standby in a group are most caught up. If I try to promote the wrong standby (it's behind or somehow incompatible), it should fail with an appropriate message. That needs a way to define a group of standby. There's nothing there that makes them know about each other. That could fall off the automated registration of them in a shared catalog on the master, with this shared catalog spread over (hard-coded) asynchronous replication (sync == disaster here). But there's no agreement on this feature yet. Then you need a way to organise them in groups in this shared catalog, and you need to ask your network admins to make it so that they can communicate with each other. Now say we have pg_ping (or another tool) returning the current recv, applied and synced LSNs, it would be possible for any standby to figure out which other ones must be shot in case you failover here. The failover command could list those other standby in the group that you're behind of, and with a force command allow you to still failover to this one. Now you have to STONITH the one listed, but that's your problem after all. Then, of course, any standby that's not in the same group as the one that you failed over to has to be checked and resynced. 7. Should I choose to use archive files as well as streaming replication, the utilities to
Re: [HACKERS] Extensions, this time with a patch
On Mon, Oct 18, 2010 at 8:37 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Here's another version of the patch, v3. Changes: CREATE EXTENSION is interesting feature! I just compiled it and tested it via SQL commands. Here is a quick report. * There are some compiler warnings. You might be missing something in copyfuncs and equalfuncs. copyfuncs.c:3119: warning: ‘_copyCreateExtensionStmt’ defined but not used copyfuncs.c:3130: warning: ‘_copyDropExtensionStmt’ defined but not used equalfuncs.c:1593: warning: ‘_equalCreateExtensionStmt’ defined but not used equalfuncs.c:1602: warning: ‘_equalDropExtensionStmt’ defined but not used postinit.c: In function ‘CheckMyDatabase’: postinit.c:341: warning: implicit declaration of function ‘ExtensionSetCVC’ * There might be some bugs in pg_dump: postgres=# CREATE EXTENSION dblink; NOTICE: Installing extension 'dblink' from '$PGHOME/share/contrib/dblink.sql', with user data CREATE EXTENSION postgres=# \q $ pg_dump pg_dump: schema with OID 2200 does not exist, but is needed for object 16411 * The example in the doc CREATE EXTENSION hstore dumps surprising warning messages, We would be better to avoid such messages, though it's not an issue for EXTENSION. WARNING: = is deprecated as an operator name DETAIL: This name may be disallowed altogether in future versions of PostgreSQL. CONTEXT: SQL statement /* contrib/hstore/hstore.sql.in */ (followed by dumped script) * Docs sql-createextension.html has two odd links: See Also DROP EXTENSION, Table 9-61, Appendix F -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Itagaki Takahiro itagaki.takah...@gmail.com writes: CREATE EXTENSION is interesting feature! I just compiled it and tested it via SQL commands. Here is a quick report. Thanks for you time and interest! * There are some compiler warnings. You might be missing something in copyfuncs and equalfuncs. copyfuncs.c:3119: warning: ‘_copyCreateExtensionStmt’ defined but not used copyfuncs.c:3130: warning: ‘_copyDropExtensionStmt’ defined but not used equalfuncs.c:1593: warning: ‘_equalCreateExtensionStmt’ defined but not used equalfuncs.c:1602: warning: ‘_equalDropExtensionStmt’ defined but not used postinit.c: In function ‘CheckMyDatabase’: postinit.c:341: warning: implicit declaration of function ‘ExtensionSetCVC’ Ouch, sorry about that, I didn't spot them. Will fix and post a v4 patch soon. * There might be some bugs in pg_dump: postgres=# CREATE EXTENSION dblink; NOTICE: Installing extension 'dblink' from '$PGHOME/share/contrib/dblink.sql', with user data CREATE EXTENSION postgres=# \q $ pg_dump pg_dump: schema with OID 2200 does not exist, but is needed for object 16411 I've hit that sometime but though that were tied to the dependency bug fixed in the v3 patch. I can reproduce here, will fix too. * The example in the doc CREATE EXTENSION hstore dumps surprising warning messages, We would be better to avoid such messages, though it's not an issue for EXTENSION. WARNING: = is deprecated as an operator name DETAIL: This name may be disallowed altogether in future versions of PostgreSQL. CONTEXT: SQL statement /* contrib/hstore/hstore.sql.in */ (followed by dumped script) I don't have a dumped script here, that maybe depends on verbosity options? * Docs sql-createextension.html has two odd links: See Also DROP EXTENSION, Table 9-61, Appendix F I didn't know if using xref would do, but if you find that odd, I will replace with linkend and a custom label there. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Josh Berkus wrote: Hackers, What purpose is served, exactly, by max_wal_senders? In order for a standby to connect, it must have a superuser login, and replication connections must be enabled in pg_hba.conf. How is having one more setting in one more file you have to enable on the master benefitting anyone? Under what bizarre set of circumstances would anyone have runaway connections from replicas to the master? Proposed that we simply remove this setting in 9.1. The real maximum wal senders should be whatever max_connections is. I disagree - limiting the maximum number of replication connections is important for my usecases. Replication connections are significantly more heavilyweight than a normal connection and right now I for example simply use this setting to prevent stupid mistakes (especially in virtualized^cloudstyle environments). What we really should look into is using a less privileged role - or dedicated replication role - and use the existing per role connection limit feature. That feature is unlimited by default, people can change it like for every role and we can git rid of that guc. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Josh Berkus wrote: Hackers, What purpose is served, exactly, by max_wal_senders? In order for a standby to connect, it must have a superuser login, and replication connections must be enabled in pg_hba.conf. How is having one more setting in one more file you have to enable on the master benefitting anyone? Under what bizarre set of circumstances would anyone have runaway connections from replicas to the master? Proposed that we simply remove this setting in 9.1. The real maximum wal senders should be whatever max_connections is. I disagree - limiting the maximum number of replication connections is important for my usecases. Replication connections are significantly more heavilyweight than a normal connection and right now I for example simply use this setting to prevent stupid mistakes (especially in virtualized^cloudstyle environments). What we really should look into is using a less privileged role - or dedicated replication role - and use the existing per role connection limit feature. That feature is unlimited by default, people can change it like for every role and we can git rid of that guc. +1 for being able to control it that wya - that should keep it simple for the newbie usecase, while retaining the ability for fine-grained control for those who need it. I think it's already on the TODO for 9.1 to use a separate role for it... If we want something fixed *now*, should we perhaps just bump the *default* value for max_wal_senders to 5 or something? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] comments on type attributes broken in 9.1devel
On Tue, Oct 19, 2010 at 1:46 AM, Peter Eisentraut pete...@gmx.net wrote: The comment code refactoring c10575ff005c330d0475345621b7d381eb510c48 broke comments on composite type attributes: COMMENT ON COLUMN foo.a is 'test'; ERROR: 42809: foo is a composite type LOCATION: heap_openrv, heapam.c:1110 This is because as the code was moved from CommentAttribute() to get_object_address_attribute(), the call to relation_openrv() became a heap_openrv(), which causes the above error. Please check it out. Thanks. I changed that and added a regression test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Magnus Hagander wrote: On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Josh Berkus wrote: Hackers, What purpose is served, exactly, by max_wal_senders? In order for a standby to connect, it must have a superuser login, and replication connections must be enabled in pg_hba.conf. How is having one more setting in one more file you have to enable on the master benefitting anyone? Under what bizarre set of circumstances would anyone have runaway connections from replicas to the master? Proposed that we simply remove this setting in 9.1. The real maximum wal senders should be whatever max_connections is. I disagree - limiting the maximum number of replication connections is important for my usecases. Replication connections are significantly more heavilyweight than a normal connection and right now I for example simply use this setting to prevent stupid mistakes (especially in virtualized^cloudstyle environments). What we really should look into is using a less privileged role - or dedicated replication role - and use the existing per role connection limit feature. That feature is unlimited by default, people can change it like for every role and we can git rid of that guc. +1 for being able to control it that wya - that should keep it simple for the newbie usecase, while retaining the ability for fine-grained control for those who need it. I think it's already on the TODO for 9.1 to use a separate role for it... I Think we had some plans to do that - I wonder how hard it would be to just do the dedicated role thing for now (maybe with the only constraint that it can only be used on a replication connection) and looking into making it (technically) less privileged later? If we want something fixed *now*, should we perhaps just bump the *default* value for max_wal_senders to 5 or something? or accept -1 for unlimited and use by default, that would fix part of the complaint from josh but you would still have to restart the master to implement a limit... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
On Oct 19, 2010, at 4:34 AM, KaiGai Kohei kai...@ak.jp.nec.com wrote: (2010/10/14 1:52), Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us wrote: That's all true, but you have to consider how much the obstacle actually gets in their way versus how painful it is on your end to create and maintain the obstacle. �I don't think this proposed patch measures up very well on either end of that tradeoff. I think it would behoove us to try to separate concerns about this particular patch from concerns about the viability of the whole approach. Whether or not it's useful to do X is a different question than whether it can be done with few enough lines of code and/or whether this patch actually does it well. I think I left the wrong impression: I'm concerned about the whole approach. I haven't even read this particular patch lately. I think that trying to guarantee that the planner applies independent constraints in a particular order will be expensive, fragile, and prone to recurring security bugs no matter how it's implemented --- unless you do it by lobotomizing query pullup/pushdown, which seems unacceptable from a performance standpoint. Just to give one example of what this patch misses (probably; as I said I haven't read it lately), what about selectivity estimation? One of the things we like to do when we have an otherwise-unknown function is to try it on all the histogram members and see what percentage yield true. That might already represent enough of an information leak to be objectionable ... and yet, if we don't do it, the consequences for performance could be pretty horrid because we'll have to work without any reasonable selectivity estimate at all. There are cases where this technique isn't applied at the moment but probably should be, which is why I characterize the leak-prevention idea as creating future security issues: doing that is a constraint that will have to be accounted for in every future planner change, and we are certain to miss the implications sometimes. Sorry, I might misread what you pointed out. I think you're still misreading it. Want to try a third time? Do you see oprrest/oprjoin being internally invoked as a problem? Well, I also think it is a problem, as long as they can be installed by non-superusers. But, it is a separated problem from the row-level security issues. In my opinion, origin of the matter is incorrect checks on creation of operators. It allows non-superusers to define a new operator with restriction and join estimator as long as he has EXECUTE privilege on the functions. (not EXECUTE privilege of actual user who invokes this function on estimation phase!) Then, the optimizer may internally invoke these functions without any privilege checks on either the function or the table to be estimated. If a malicious one tries to make other users to invoke a trojan-horse, he can define a trappy operator with malicious estimator functions, cannot it? This is a pretty poor excuse for a Trojan horse attack. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Itagaki Takahiro itagaki.takah...@gmail.com writes: * There are some compiler warnings. You might be missing something in copyfuncs and equalfuncs. Fixed in v4, attached. * There might be some bugs in pg_dump: pg_dump: schema with OID 2200 does not exist, but is needed for object 16411 Fixed in v4, attached. The problem was with the query used in pg_dump to filter out relations that are part of an extension, in getTables(). A composite type will create an underlying relation of relkind 'c', but there was no direct dependency to the extension, thus the filter failed to bypass it. It's fixed by adding a direct internal dependency between the relation and the extension, as that's so much easier than doing a recursive scan of pg_depend in pg_dump SQL queries. I will try to find out if other cases are forgotten like this one. Note that as a result you need to drop extension dblink then create it again so that you benefit from the fix. Also note that drop extension is recursively following the dependencies so is not concerned by this bug. * The example in the doc CREATE EXTENSION hstore dumps surprising warning messages, We would be better to avoid such messages, though it's not an issue for EXTENSION. WARNING: = is deprecated as an operator name DETAIL: This name may be disallowed altogether in future versions of PostgreSQL. CONTEXT: SQL statement /* contrib/hstore/hstore.sql.in */ (followed by dumped script) I didn't realise that using SPI would mean dumping the all script in case of even NOTICEs. May be we want to protect against that in the CREATE EXTENSION case, but I didn't have a look at how to do it. Do we want CREATE EXTENSION to be more quiet than SPI usually is? * Docs sql-createextension.html has two odd links: See Also DROP EXTENSION, Table 9-61, Appendix F Fixed in v4, attached. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support extension.v4.patch.gz Description: pg_dump support for PostgreSQL extensions -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Oct 19, 2010, at 8:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I didn't realise that using SPI would mean dumping the all script in case of even NOTICEs. May be we want to protect against that in the CREATE EXTENSION case, but I didn't have a look at how to do it. Do we want CREATE EXTENSION to be more quiet than SPI usually is? I don't see why. I think the real action item here is to remove = from hstore. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Robert Haas robertmh...@gmail.com writes: I don't see why. I think the real action item here is to remove = from hstore. As input, consider that lots of extensions will create types that are only a shell at the moment of the CREATE TYPE, and for each of those types you will see the (potentially 1000 lines long) whole SQL script dumped on the screen. In the following script, I've filtered out the scripts, but it's written out for each NOTICE message that you see: dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension citext; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'citext' from '/Users/dim/pgsql/exts/share/contrib/citext.sql', with user data NOTICE: return type citext is only a shell NOTICE: argument type citext is only a shell NOTICE: return type citext is only a shell NOTICE: argument type citext is only a shell dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension cube; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'cube' from '/Users/dim/pgsql/exts/share/contrib/cube.sql', with user data NOTICE: type cube is not yet defined NOTICE: return type cube is only a shell NOTICE: return type cube is only a shell NOTICE: argument type cube is only a shell dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension earthdistance; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'earthdistance' from '/Users/dim/pgsql/exts/share/contrib/earthdistance.sql', with user data dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension fuzzystrmatch; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'fuzzystrmatch' from '/Users/dim/pgsql/exts/share/contrib/fuzzystrmatch.sql', with user data dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension hstore; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'hstore' from '/Users/dim/pgsql/exts/share/contrib/hstore.sql', with user data NOTICE: return type hstore is only a shell NOTICE: argument type hstore is only a shell NOTICE: return type hstore is only a shell NOTICE: argument type hstore is only a shell NOTICE: return type ghstore is only a shell NOTICE: argument type ghstore is only a shell dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension isn; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'isn' from '/Users/dim/pgsql/exts/share/contrib/isn.sql', with user data NOTICE: type ean13 is not yet defined NOTICE: argument type ean13 is only a shell NOTICE: type isbn13 is not yet defined NOTICE: argument type isbn13 is only a shell NOTICE: type ismn13 is not yet defined NOTICE: argument type ismn13 is only a shell NOTICE: type issn13 is not yet defined NOTICE: argument type issn13 is only a shell NOTICE: type isbn is not yet defined NOTICE: argument type isbn is only a shell NOTICE: type ismn is not yet defined NOTICE: argument type ismn is only a shell NOTICE: type issn is not yet defined NOTICE: argument type issn is only a shell NOTICE: type upc is not yet defined NOTICE: argument type upc is only a shell dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension ltree; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'ltree' from '/Users/dim/pgsql/exts/share/contrib/ltree.sql', with user data NOTICE: type ltree is not yet defined NOTICE: argument type ltree is only a shell NOTICE: type lquery is not yet defined NOTICE: argument type lquery is only a shell NOTICE: type ltxtquery is not yet defined NOTICE: argument type ltxtquery is only a shell NOTICE: type ltree_gist is not yet defined NOTICE: argument type ltree_gist is only a shell Just saying, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support PS: Oh, a repalloc() bug now. Will fix later in the afternoon, \dx or select * from pg_extensions(); crashes with more than 10 extensions installed in the v4 patch. That's what I get for doing that on a Saturday evening. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
(2010/10/19 21:31), Robert Haas wrote: On Oct 19, 2010, at 4:34 AM, KaiGai Koheikai...@ak.jp.nec.com wrote: (2010/10/14 1:52), Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us wrote: That's all true, but you have to consider how much the obstacle actually gets in their way versus how painful it is on your end to create and maintain the obstacle. �I don't think this proposed patch measures up very well on either end of that tradeoff. I think it would behoove us to try to separate concerns about this particular patch from concerns about the viability of the whole approach. Whether or not it's useful to do X is a different question than whether it can be done with few enough lines of code and/or whether this patch actually does it well. I think I left the wrong impression: I'm concerned about the whole approach. I haven't even read this particular patch lately. I think that trying to guarantee that the planner applies independent constraints in a particular order will be expensive, fragile, and prone to recurring security bugs no matter how it's implemented --- unless you do it by lobotomizing query pullup/pushdown, which seems unacceptable from a performance standpoint. Just to give one example of what this patch misses (probably; as I said I haven't read it lately), what about selectivity estimation? One of the things we like to do when we have an otherwise-unknown function is to try it on all the histogram members and see what percentage yield true. That might already represent enough of an information leak to be objectionable ... and yet, if we don't do it, the consequences for performance could be pretty horrid because we'll have to work without any reasonable selectivity estimate at all. There are cases where this technique isn't applied at the moment but probably should be, which is why I characterize the leak-prevention idea as creating future security issues: doing that is a constraint that will have to be accounted for in every future planner change, and we are certain to miss the implications sometimes. Sorry, I might misread what you pointed out. I think you're still misreading it. Hmm. In my understanding, it seems to me he concerned about possible leaky estimate functions, so he mentioned the horrible performance degrading, if we don't allow to execute these functions. So, I suggested an idea that enforces all estimate functions being installed by superusers; it enables us to assume they are enough safe. Want to try a third time? However, actually, it is still unclear for me... :-( Do you see oprrest/oprjoin being internally invoked as a problem? Well, I also think it is a problem, as long as they can be installed by non-superusers. But, it is a separated problem from the row-level security issues. In my opinion, origin of the matter is incorrect checks on creation of operators. It allows non-superusers to define a new operator with restriction and join estimator as long as he has EXECUTE privilege on the functions. (not EXECUTE privilege of actual user who invokes this function on estimation phase!) Then, the optimizer may internally invoke these functions without any privilege checks on either the function or the table to be estimated. If a malicious one tries to make other users to invoke a trojan-horse, he can define a trappy operator with malicious estimator functions, cannot it? This is a pretty poor excuse for a Trojan horse attack. ...Robert -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Tue, Oct 19, 2010 at 9:07 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: I don't see why. I think the real action item here is to remove = from hstore. As input, consider that lots of extensions will create types that are only a shell at the moment of the CREATE TYPE, and for each of those types you will see the (potentially 1000 lines long) whole SQL script dumped on the screen. In the following script, I've filtered out the scripts, but it's written out for each NOTICE message that you see: dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension citext; 21 | egrep 'NOTICE|ERROR' NOTICE: Installing extension 'citext' from '/Users/dim/pgsql/exts/share/contrib/citext.sql', with user data NOTICE: return type citext is only a shell NOTICE: argument type citext is only a shell NOTICE: return type citext is only a shell NOTICE: argument type citext is only a shell Well, perhaps it's reasonable to suppress NOTICE messages then. That particular message could perhaps be quieted, but we probably don't want to do that with every NOTICE that might occur (e.g. those that you might get on table creation for sequences, indices, etc.). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
Excerpts from Magnus Hagander's message of mar oct 19 05:23:31 -0300 2010: He certainly could, but github provides more features and a nicer look :-) And since it's git, it doesn't matter where the repo is. Yeah. If you have a checked out copy of the GIT repo (preferably one with the master branch in it), try this: git remote add venum git://github.com/adunstan/postgresql-dev.git git fetch venum venums:venums git checkout venums Then you have the patch in all its Git glory, in branch venums. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create tablespace fails silently, or succeeds improperly
Bruce Momjian br...@momjian.us writes: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Looking at the pg_upgrade code some more, I found that it was not removing the PG_VERSION file when deleting = 8.4 tablespace files. This might confuse administrators so the attached patch adds the removal of PG_VERSION. I would like to apply this to master and 9.0.X. ... why is that a good idea? The script already deletes the 8.4 database directories, but leaves PG_VERSION behind. Why keep it when all the 8.4 data is gone? The script also dates PGDATA for 8.4, so there is nothing left pointing to that directory. Oh, I misunderstood: I thought you were proposing to do this as an automatic action inside pg_upgrade. If it's part of the cleanup script, it's fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
On Mon, Oct 18, 2010 at 04:13:04PM +0100, Mark Cave-Ayland wrote: David Fetter wrote: For my vote, I'd prefer either the Oid of a custom type or an array of Oid, Datum pairs - i.e. something we can extend in the future if required. This sounds a lot like a foreign key to another table. Are you not proposing doing that because of performance considerations? Cheers, David. Well, in PostGIS a typmod contains 3 pieces of information: 1) the SRID 2) the dimension 3) the geometry type The SRID is technically already a foreign key into another table, with dimension and SRID as other information. At the moment, we bit-pack the dimension and geometry type into the SRID and use that as the typmod but this only leaves 21 bits IIRC for the SRID. The additional complication is that SRIDs at the higher end of the range are allowed for anyone to use, and so people may have their own customised spheroids defined in this region of the table. Sounds like you need space for all of these separately, and once you have that, you'll probably start thinking about other possible pieces of information you could store, especially as you start to store new data types and have new operators on them. It sounds to me as thought the typemod, or something like it, needs to be able to store, in general, completely arbitrary information, although not likely much over a page or two of memory. Cf. Robert Haas's recent comment about Klingon. While we could make this a completely opaque structure from the SQL level, I'm not sure it's a good idea. If we had a foreign key into another table, we'd need to ensure that no one could tamper with it as otherwise all chaos would break lose, e.g. breaking the geometry type constraint on a column. Our system catalog tables have the nice property that no one can accidentally modify them by hand, and all chaos, as you put it, can reasonably be expected to occur should they choose to do so. Heck, we even have people deleting the geometry_columns table sometimes because they are not aware of what it does. By storing this information in the PG catalog then this can't happen, plus the information is available easily in Form_pg_attribute without having to implement our own cache, with its own related problems such as how/when to invalidate etc. :) There is also a chance that we'd want to include additional information in the future related to geometry validity, for example, which would mean further reducing the range allowed within the spatial_ref_sys table in its existing form. Right. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATH] Idle in transaction cancellation
Andres Freund and...@anarazel.de wrote: Here is a proposed patch which enables cancellation of $subject. Cool. Some enhancements we'd like to do to Serializable Snapshot Isolation (SSI), should the base patch make it in, would require this capability. Currently it does *not* report any special error message to the client if it starts sending commands in an (unbekownst to it) failed transaction, but just the normal 25P02: current transaction is aborted... message. It shouldn't be hard to add that and I will propose a patch if people would like it (I personally am not very interested, but I can see people validly wanting it) For SSI purposes, it would be highly desirable to be able to set the SQLSTATE and message generated when the canceled transaction terminates. but I would like to have some feedback on the patch first. I'll take a look when I can, but it may be a couple weeks from now. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATH] Idle in transaction cancellation
On Tue, Oct 19, 2010 at 10:18 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Andres Freund and...@anarazel.de wrote: Here is a proposed patch which enables cancellation of $subject. I'll take a look when I can, but it may be a couple weeks from now. How about adding it to https://commitfest.postgresql.org/action/commitfest_view/open ? Seems like a good feature, at least from 20,000 feet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Robert Haas robertmh...@gmail.com writes: I don't see why. I think the real action item here is to remove = from hstore. As input, consider that lots of extensions will create types that are only a shell at the moment of the CREATE TYPE, and for each of those types you will see the (potentially 1000 lines long) whole SQL script dumped on the screen. Only if the script is intentionally noisy. The right fix here is probably to bump up the min message level while running the script. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
KaiGai Kohei kai...@kaigai.gr.jp writes: (2010/10/19 21:31), Robert Haas wrote: I think you're still misreading it. Hmm. In my understanding, it seems to me he concerned about possible leaky estimate functions, so he mentioned the horrible performance degrading, if we don't allow to execute these functions. No, it wasn't about estimator functions at all, it was about what we do in the absence of an estimator. So, I suggested an idea that enforces all estimate functions being installed by superusers; it enables us to assume they are enough safe. Even if we were willing to accept a limitation as stringent as that, preventing people from installing estimators is hardly going to cure the problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzo t...@laurenzo.org wrote: - It is directly iterable without parsing and/or constructing an AST - It is its own representation. If iterating and you want to tear-off a value to be returned or used elsewhere, its a simple buffer copy plus some bit twiddling. - It is conceivable that clients already know how to deal with BSON, allowing them to work with the internal form directly (ala MongoDB) - It stores a wider range of primitive types than JSON-text. The most important are Date and binary. When last I looked at that, it appeared to me that what BSON could represent was a subset of what JSON could represent - in particular, that it had things like a 32-bit limit on integers, or something along those lines. Sounds like it may be neither a superset nor a subset, in which case I think it's a poor choice for an internal representation of JSON. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On 10/19/2010 10:44 AM, Robert Haas wrote: On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzot...@laurenzo.org wrote: - It is directly iterable without parsing and/or constructing an AST - It is its own representation. If iterating and you want to tear-off a value to be returned or used elsewhere, its a simple buffer copy plus some bit twiddling. - It is conceivable that clients already know how to deal with BSON, allowing them to work with the internal form directly (ala MongoDB) - It stores a wider range of primitive types than JSON-text. The most important are Date and binary. When last I looked at that, it appeared to me that what BSON could represent was a subset of what JSON could represent - in particular, that it had things like a 32-bit limit on integers, or something along those lines. Sounds like it may be neither a superset nor a subset, in which case I think it's a poor choice for an internal representation of JSON. Yeah, if it can't handle arbitrary precision numbers as has previously been stated it's dead in the water for our purposes, I think. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: Only if the script is intentionally noisy. The right fix here is probably to bump up the min message level while running the script. You mean doing that from the SQL script itself (using SET) or in the pg_execute_from_file() code? My guess is the former, but... Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
Robert Haas wrote: On Thu, Oct 7, 2010 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Proposed doc patch attached. discusesed? ?Otherwise +1 Woops, thanks. Committed with that change. I back-patched it back to 8.3, which is as far as it applied with only minor conflicts. I have applied the attached patch which mentions tools/fsync for testing fsync method performance, and clarified the new paragraph about sync methods. I am glad to see we are beefing up this area of the docs. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 87d182c..f178835 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1569,13 +1569,13 @@ SET ENABLE_SEQSCAN TO OFF; /itemizedlist para Not all of these choices are available on all platforms. -The default is the first method in the above list that is supported -by the platform. The default is not necessarily best; it may be -necessary to change this setting, or other aspects of your system -configuration, in order to create a crash-safe configuration, as -discussed in xref linkend=wal-reliability, or to achieve best -performance. The literalopen_/* options also use literalO_DIRECT/ if available. +The default is the first method in the above list that is supported +by the platform. The default is not necessarily ideal; it might be +necessary to change this setting or other aspects of your system +configuration in order to create a crash-safe configuration or +achieve optimal performance. +These aspects are discussed in xref linkend=wal-reliability. The utility filenamesrc/tools/fsync/ in the PostgreSQL source tree can do performance testing of various fsync methods. This parameter can only be set in the filenamepostgresql.conf/ diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 5678d6e..7b50bdd 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -530,11 +530,13 @@ para The xref linkend=guc-wal-sync-method parameter determines how productnamePostgreSQL/productname will ask the kernel to force -acronymWAL/acronym updates out to disk. - With the exception of literalfsync_writethrough/, which can sometimes - force a flush of the disk cache even when other options do not do so, - all the options should be the same in terms of reliability. - However, it's quite platform-specific which one will be the fastest. + acronymWAL/acronym updates out to disk. + All the options should be the same in terms of reliability, with + the exception of literalfsync_writethrough/, which can sometimes + force a flush of the disk cache even when other options do not do so. + However, it's quite platform-specific which one will be the fastest; + you can test option speeds using the utility filenamesrc/tools/fsync/ + in the PostgreSQL source tree. Note that this parameter is irrelevant if varnamefsync/varname has been turned off. /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
3) 3-rd boolean column (amopopr, amopfamily, amoporder) - could be two records per operator + operator could be used in both roles + strategy number could be different for different roles How can #3 work at all? It's ignoring a couple of critical index columns. In particular, I believe the sticking point here is this unique index: pg_amop_fam_strat_index UNIQUE, btree (amopfamily, amoplefttype, amoprighttype, amopstrategy) I believe, that columns (amoplefttype, amoprighttype) are fixed for operation and could not be changed. So, two operations in one opfamily should be differ in strategy number for different roles. It also gives a direct way to transfer knowledge abot role to consistent method of GiST. I'm not terribly thrilled with using a boolean here in any case. Now that we have two roles an operator might play in an opclass, who's to say there might not be more roles someday? We should use a column type that will support more than two roles without basic rejiggering. It's easy to change to char type, for now only 's'search and 'o'order characters will be allowed. BTW, have we discussed the idea of embedding the role in the strategy number? For example, require regular operators to have strategy In this schema, one operator could not be used in more than one role. Right now it's not strict limitation, because the we still don't have an example of boolean distance. Anyhow, it needs to distinguish roles while IndexPath is built. So, op_in_opfamily/get_op_opfamily_strategy/get_op_opfamily_properties and friends will accept extra argument pointing to interested role. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
Thinking about it that way, perhaps we could add an integer column amop_whats_it_good_for that gets used as a bit field. That wouldn't require changing the index structure, although it might break some other things. OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH | ORDER } [, ...] ] It's very desirable thing to be able to distinguish roles in consistent method of GiST: computation of distance could be very expensive and. The single way to provide it in current GiST interface is a strategy number. Of course, we could add 6-th argument to consistent to point role, but I don't think that's good decision. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Josh Berkus wrote: It is critical that we make replication easier to set up, administrate and monitor than it currently is. In my conversations with people, this is more important to our users and the adoption of PostgreSQL than synchronous replication is. You should enjoy one of the patches we're furiously working on then, which is aiming at some of the administration and monitoring pieces here. I have my own grand vision of how easy replication should be to setup too. Visions and plans are nice, but building functional pieces of them and delivering them to the community is what actually moves PostgreSQL forward. So far, multiple people have done that for sync rep, and what we're supposed to be focused on at this stage in the development cycle is finishing the work related to the open CommitFest item that includes that. I find this launch into a new round of bike-shedding a bit distracting. If you want this to be easier to use, which it's obvious to any observer it should be because what's delivered in 9.0 is way too complicated, please work on finding development resources to assign to that problem. Because that's the bottleneck on simplifying things, not ideas about what to do. I would recommend finding or assigning a developer to work on integrating base backup in to the streaming protocol as the biggest single thing that would improve the built-in replication. All of the rest of the trivia about what knobs to set and such are tiny details that make for only a minor improvement until that's taken care of. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Agreed. BSON was born out of implementations that either lacked arbitrary precision numbers or had a strong affinity to an int/floating point way of thinking about numbers. I believe that if BSON had an arbitrary precision number type, it would be a proper superset of JSON. As an aside, the max range of an int in BSON 64bits. Back to my original comment that BSON was grown instead of designed, it looks like both the 32bit and 64bit integers were added late in the game and that the original designers perhaps were just going to store all numbers as double. Perhaps we should enumerate the attributes of what would make a good binary encoding? Terry On Tue, Oct 19, 2010 at 8:57 AM, Andrew Dunstan and...@dunslane.net wrote: On 10/19/2010 10:44 AM, Robert Haas wrote: On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzot...@laurenzo.org wrote: - It is directly iterable without parsing and/or constructing an AST - It is its own representation. If iterating and you want to tear-off a value to be returned or used elsewhere, its a simple buffer copy plus some bit twiddling. - It is conceivable that clients already know how to deal with BSON, allowing them to work with the internal form directly (ala MongoDB) - It stores a wider range of primitive types than JSON-text. The most important are Date and binary. When last I looked at that, it appeared to me that what BSON could represent was a subset of what JSON could represent - in particular, that it had things like a 32-bit limit on integers, or something along those lines. Sounds like it may be neither a superset nor a subset, in which case I think it's a poor choice for an internal representation of JSON. Yeah, if it can't handle arbitrary precision numbers as has previously been stated it's dead in the water for our purposes, I think. cheers andrew
Re: [HACKERS] O_DSYNC broken on MacOS X?
Greg Smith wrote: A.M. wrote: Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are within the realm of possibility. This is what the test_fsync utility that already ships with the database should be useful for. The way Bruce changed it to report numbers in commits/second for 9.0 makes it a lot easier to use for this purpose than it used to be. I think there's still some additional improvements that could be made there, but it's a tricky test to run accurately. The test_fsync was designed to test various things like whether several open-sync writes are better than two write and an fsync, and whether you can fsync data written on a different file descriptor. It is really a catch-all test right now, not one specific for choosing sync methods. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/JS
Cool, will take a look. Thanks! --- On Tue, 19/10/10, Itagaki Takahiro itagaki.takah...@gmail.com wrote: From: Itagaki Takahiro itagaki.takah...@gmail.com Subject: Re: [HACKERS] PL/JS To: Terri Laurenzo t...@laurenzo.org Cc: Greg grigo...@yahoo.co.uk, pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org Date: Tuesday, 19 October, 2010, 3:18 On Tue, Oct 19, 2010 at 9:04 AM, Terri Laurenzo t...@laurenzo.org wrote: Is there a pl/js yet? I've not been able to find anything but was thinking about putting something together based on v8 or spidermonkey. I'm working on PL/v8, that is one of PL/JS (PL/JavaScript) implementation based on v8. http://code.google.com/p/plv8js/ -- Itagaki Takahiro
Re: [HACKERS] O_DSYNC broken on MacOS X?
On Oct 19, 2010, at 11:22 AM, Bruce Momjian wrote: Greg Smith wrote: A.M. wrote: Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are within the realm of possibility. This is what the test_fsync utility that already ships with the database should be useful for. The way Bruce changed it to report numbers in commits/second for 9.0 makes it a lot easier to use for this purpose than it used to be. I think there's still some additional improvements that could be made there, but it's a tricky test to run accurately. The test_fsync was designed to test various things like whether several open-sync writes are better than two write and an fsync, and whether you can fsync data written on a different file descriptor. It is really a catch-all test right now, not one specific for choosing sync methods. I am working on simplifying the test_fsync tool and making it a contrib function which can be run by the superuser based on the configured fsync method. That way, the list can ask a user to run it to report fsyncs-per-second for suspiciousness. The goal is to make it more accessible. I was also thinking about adding some notes along the lines of Your drive fsync speed rates between a 5400 RPM SATA drive and a 7200 RPM SATA drive. or Your drive fsync speed rates as high as RAM- your fsync method may be wrong. Currently, the test tool is not even compiled by default. Thoughts? Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
combination of parameters is one that it can handle. I'm thinking perhaps in lieu of a boolean, we can add another indexam method which, if not InvalidOid, gets called when we're wondering about whether a given clause is something that the index can order by. Although knngist focuses on a the ORDER BY col OP constant case, which Hmm, interesting idea. To be more general, amcanorder (without byop suffix) could be eliminated too. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Only if the script is intentionally noisy. The right fix here is probably to bump up the min message level while running the script. You mean doing that from the SQL script itself (using SET) or in the pg_execute_from_file() code? My guess is the former, but... You could argue that either way I guess. The script knows what it needs, but OTOH just about every extension there is will probably be generating useless NOTICEs unless something is done, so maybe the extension management code should take care of it for them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
You should enjoy one of the patches we're furiously working on then, which is aiming at some of the administration and monitoring pieces here. Great, glad to hear it! Would you be willing to go into detail? I have my own grand vision of how easy replication should be to setup too. So, share it. I'd look forward to hearing it, especially since your vision probably takes synch rep and quorum commit into account, which mine doesn't. If not here, then on your blog. Visions and plans are nice, but building functional pieces of them and delivering them to the community is what actually moves PostgreSQL forward. *shrug*. Robert asked me to write it up for the list based on the discussions around synch rep. Now you're going to bash me for doing so? Many of the goals I described will mean removing knobs and changing defaults, or even foregoing fine-grained control entirely. If we don't have agreement that simplifying replication is a high-priority goal, then it won't happen; anyone submitting a patch will be this-or-that-use-cased to death and will give up. For that matter, I'm not sure that everyone agrees that simplification is a worthwhile goal. For example, somewhere between 9.0beta4 and final release, someone changed the defaults for max_wal_senders and hot_standby to 0 and off. I don't remember there even being discussion about it. The discussion around synch rep certainly showed that the natural tendency of this list is to add complexity with each incarnation of a feature. It's the easiest way to accomodate conflicting use cases, but it's not the best way. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create tablespace fails silently, or succeeds improperly
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Looking at the pg_upgrade code some more, I found that it was not removing the PG_VERSION file when deleting = 8.4 tablespace files. This might confuse administrators so the attached patch adds the removal of PG_VERSION. I would like to apply this to master and 9.0.X. ... why is that a good idea? The script already deletes the 8.4 database directories, but leaves PG_VERSION behind. Why keep it when all the 8.4 data is gone? The script also dates PGDATA for 8.4, so there is nothing left pointing to that directory. Oh, I misunderstood: I thought you were proposing to do this as an automatic action inside pg_upgrade. If it's part of the cleanup script, it's fine. Yes, never automatic. You can always roll back after pg_upgrade completes. Once you start the new server, only then can you not go back. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Josh Berkus wrote: *shrug*. Robert asked me to write it up for the list based on the discussions around synch rep. Now you're going to bash me for doing so? Sorry, next time I'll make sure to bash Robert too. I don't have any problems with the basic ideas you're proposing, just concerns about when the right time to get into that whole giant subject is and who is going to work on. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Stefan, Dimitri, I disagree - limiting the maximum number of replication connections is important for my usecases. Can you explain more? I clearly don't understand your use case. If we want something fixed *now*, should we perhaps just bump the *default* value for max_wal_senders to 5 or something? If we're not going to remove it, then the default should be -1, which should mean same as max_connections. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Josh Berkus wrote: Under what bizarre set of circumstances would anyone have runaway connections from replicas to the master? Cloud computing deployments where additional replicas are brought up automatically in response to demand. It's easy to imagine a situation where a standby instance is spawned, starts to sync, and that additional load triggers *another* standby to come on board; repeat until the master is doing nothing but servicing standby sync requests. max_wal_senders provides a safety value for that. I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. I'd be in favor of just committing that change right now, before it gets forgotten about, and then if nobody else gets around to further work at least something improved here for 9.1. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane t...@sss.pgh.pa.us writes: You could argue that either way I guess. The script knows what it needs, but OTOH just about every extension there is will probably be generating useless NOTICEs unless something is done, so maybe the extension management code should take care of it for them. Either way is the key here too, so please find attached a revised (v5) patch which will force log_min_messages and client_min_messages to WARNING while the script is run. v5 also contains the \dx bug fix about repalloc. Please note that I didn't touch any contrib yet, so that hstore will still dump its full script here: dim=# create extension isn; NOTICE: Installing extension 'isn' from '/Users/dim/pgsql/exts/share/contrib/isn.sql', with user data CREATE EXTENSION dim=# create extension hstore; NOTICE: Installing extension 'hstore' from '/Users/dim/pgsql/exts/share/contrib/hstore.sql', with user data WARNING: = is deprecated as an operator name DETAIL: This name may be disallowed altogether in future versions of PostgreSQL. CONTEXT: SQL statement /* contrib/hstore/hstore.sql.in */ The script follows here. Maybe 9.1 is when to deprecate = as an operator name in the hstore official extension? :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support Of course the git repo is still maintained for those prefering it: http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension extension.v5.patch.gz Description: pg_dump support for PostgreSQL extensions -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Excerpts from Dimitri Fontaine's message of dom oct 17 16:30:47 -0300 2010: The bulk of it is now short enough to be inlined in the mail, and if you have more comments I guess they'll be directed at this portion of the patch, so let's make it easy: /* * We abuse some internal knowledge from spi.h here. As we don't know * which queries are going to get executed, we don't know what to expect * as an OK return code from SPI_execute(). We assume that * SPI_OK_CONNECT, SPI_OK_FINISH and SPI_OK_FETCH are quite improbable, * though, and the errors are negatives. So a valid return code is * considered to be SPI_OK_UTILITY or anything from there. */ SPI_connect(); if (SPI_execute(query_string, false, 0) SPI_OK_UTILITY) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), errmsg(File '%s' contains invalid query, filename))); SPI_finish(); SPI_OK_FETCH is indeed improbable -- it's not used anywhere in the SPI routines, and hasn't been for ages. SPI_OK_CONNECT and SPI_OK_FINNISH are only issued by SPI_connect and SPI_finish, so presumably they can't be returned by a script. The error message wording needs some work; maybe errmsg(file \%s\ could not be executed, filename) SPI_execute sets the query as errcontext, which is good; but apparently there is no error position, which is bad. I'm not sure how feasible is to fix this. (Not this patch's responsibility anyway.) I happened to notice that there are several pieces of code that are calling SPI_connect and SPI_finish without checking the return value, notably the FTS code and xml.c. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Excerpts from Dimitri Fontaine's message of vie oct 15 16:15:23 -0300 2010: The cfparser patch didn't change, the current version is still v1. Hmm, this needs some cleanup; the comments still talk about the old function name; and about just the recovery.conf file. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
A.M. wrote: On Oct 19, 2010, at 11:22 AM, Bruce Momjian wrote: Greg Smith wrote: A.M. wrote: Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are within the realm of possibility. This is what the test_fsync utility that already ships with the database should be useful for. The way Bruce changed it to report numbers in commits/second for 9.0 makes it a lot easier to use for this purpose than it used to be. I think there's still some additional improvements that could be made there, but it's a tricky test to run accurately. The test_fsync was designed to test various things like whether several open-sync writes are better than two write and an fsync, and whether you can fsync data written on a different file descriptor. It is really a catch-all test right now, not one specific for choosing sync methods. I am working on simplifying the test_fsync tool and making it a contrib function which can be run by the superuser based on the configured fsync method. That way, the list can ask a user to run it to report fsyncs-per-second for suspiciousness. The goal is to make it more accessible. I was also thinking about adding some notes along the lines of Your drive fsync speed rates between a 5400 RPM SATA drive and a 7200 RPM SATA drive. or Your drive fsync speed rates as high as RAM- your fsync method may be wrong. Currently, the test tool is not even compiled by default. Thoughts? Agreed. Let me know if you have any questions. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Dimitri, Greg, I want to say a big big +1 here. The way replication and PITR setup are implemented now are a very good prototype, it's time to consolidate and get to something usable by normal people, as opposed to PostgreSQL full time geeks. Well, one thing to be addressed is separating the PITR functionality from replication. PITR needs a lot of features -- timelines, recovery stop points, etc. -- which replication doesn't need or want. I think that focussing on streaming replication functionality and ignoring the archive logs case is probably the best way to logically separate these two. Presumably anyone who needs archive logs as well will be a professional DBA. I could prepare a patch given some advice on the replication protocol integration. For one, is streaming a base backup something that walsender should care about? Yeah, I thought there was a prototype for this somewhere. From a user perspective, using a 2nd pgport connection for the initial clone is fine. I don't know if we want to worry about it otherwise from a resource management perspective; presumably the cloning process is going to be a pretty big performance hit on the master. BTW, do we have a clear idea of how to implement pg_ping, and should it reports current WAL location(s) of a standby? pg_ping? That needs a way to define a group of standby. There's nothing there that makes them know about each other. Let me clarify. I meant that if I try to make a *single* standby point to a new master, and that new master was behind the standby when it failed over, then the attempt to remaster should fail with an error. I do *not* want to get into standby groups. That way lies madness. ;-) Now say we have pg_ping (or another tool) returning the current recv, applied and synced LSNs, it would be possible for any standby to figure out which other ones must be shot in case you failover here. The failover command could list those other standby in the group that you're behind of, and with a force command allow you to still failover to this one. Now you have to STONITH the one listed, but that's your problem after all. The LSN isn't enough; as others have pointed out, we have a fairly serious failure case if a standby comes up as a master, accepts transactions, and then we try to remaster a 2nd standby which was actually ahead of the first standby at the time of master failure. I haven't seen a solution posted to that yet; maybe I missed it? Sorry, next time I'll make sure to bash Robert too. I don't have any problems with the basic ideas you're proposing, just concerns about when the right time to get into that whole giant subject is and who is going to work on. If not now, when? The 2nd CommitFest is almost complete. If we're going to make any substantial changes, we need to have patches for the 3rd commitfest. And I didn't see anyone discussing simplification until I brought it up. I don't realistically think that we're going to get 100% simplification for 9.1. But it would be nice to at least get some components, which means getting agreement on how things should work, at least roughly. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On 10/19/2010 09:06 AM, Greg Smith wrote: I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. I'd be in favor of just committing that change right now, before it gets forgotten about, and then if nobody else gets around to further work at least something improved here for 9.1. Heck, even *I* could write that patch, if we're agreed. Although you can commit it. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Alvaro Herrera alvhe...@commandprompt.com writes: The error message wording needs some work; maybe errmsg(file \%s\ could not be executed, filename) [...] I happened to notice that there are several pieces of code that are calling SPI_connect and SPI_finish without checking the return value, notably the FTS code and xml.c. Please find attached pg_execute_from_file.v4.patch with fixes. I've used the usual error messages for SPI_connect() and finish this time. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 13897,13902 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 13897,13909 entrytyperecord/type/entry entryReturn information about a file/entry /row + row +entry + literalfunctionpg_execute_from_file(parameterfilename/ typetext/)/function/literal +/entry +entrytypevoid/type/entry +entryExecutes the acronymSQL/ commands contained in a file/entry + /row /tbody /tgroup /table *** *** 13935,13940 SELECT (pg_stat_file('filename')).modification; --- 13942,13956 /programlisting /para +indexterm + primarypg_execute_from_file/primary +/indexterm +para + functionpg_execute_from_file/ makes the server + execute acronymSQL/ commands to be found in a file. This function is + reserved to superusers. +/para + para The functions shown in xref linkend=functions-advisory-locks manage advisory locks. For details about proper use of these functions, see *** *** 13957,13962 SELECT (pg_stat_file('filename')).modification; --- 13973,13979 entrytypevoid/type/entry entryObtain exclusive advisory lock/entry /row + row entry literalfunctionpg_advisory_lock(parameterkey1/ typeint/, parameterkey2/ typeint/)/function/literal *** a/src/backend/utils/adt/genfile.c --- b/src/backend/utils/adt/genfile.c *** *** 7,12 --- 7,13 * Copyright (c) 2004-2010, PostgreSQL Global Development Group * * Author: Andreas Pflug pgad...@pse-consulting.de + * Dimitri Fontaine dimi...@2ndquadrant.fr * * IDENTIFICATION * src/backend/utils/adt/genfile.c *** *** 21,26 --- 22,28 #include dirent.h #include catalog/pg_type.h + #include executor/spi.h #include funcapi.h #include mb/pg_wchar.h #include miscadmin.h *** *** 264,266 pg_ls_dir(PG_FUNCTION_ARGS) --- 266,340 SRF_RETURN_DONE(funcctx); } + + /* + * Read a file then execute the SQL commands it contains. + */ + Datum + pg_execute_from_file(PG_FUNCTION_ARGS) + { + text *filename_t = PG_GETARG_TEXT_P(0); + char *filename; + FILE *file; + int64 fsize = -1, nbytes; + struct stat fst; + char *query_string = NULL; + + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg(must be superuser to get file information; + + /* + * Only superuser can call pg_execute_from_file, and CREATE EXTENSION + * uses that too. Don't double check the PATH. Also note that + * extension's install files are not in $PGDATA but `pg_config + * --sharedir`. + */ + filename = text_to_cstring(filename_t); + + if (stat(filename, fst) 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not stat file \%s\: %m, filename))); + + fsize = Int64GetDatum((int64) fst.st_size); + + if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not open file \%s\ for reading: %m, + filename))); + + if (ferror(file)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not read file \%s\: %m, filename))); + + query_string = (char *)palloc((fsize+1)*sizeof(char)); + memset(query_string, 0, fsize+1); + nbytes = fread(query_string, 1, (size_t) fsize, file); + pg_verifymbstr(query_string, nbytes, false); + FreeFile(file); + + /* + * We abuse some internal knowledge from spi.h here. As we don't know + * which queries are going to get executed, we don't know what to expect + * as an OK return code from SPI_execute(). We assume that + * SPI_OK_CONNECT, SPI_OK_FINISH and SPI_OK_FETCH are quite improbable, + * though, and the errors are negatives. So a valid return code is + * considered to be SPI_OK_UTILITY or anything from there. + */ + if (SPI_connect() != SPI_OK_CONNECT) + elog(ERROR, SPI_connect failed); + + if (SPI_execute(query_string, false, 0) SPI_OK_UTILITY) + ereport(ERROR, + (errcode(ERRCODE_DATA_EXCEPTION), + errmsg(File '%s' could not be executed, filename))); + + if (SPI_finish() != SPI_OK_FINISH) + elog(ERROR, SPI_finish failed); + +
Re: [HACKERS] Extensions, this time with a patch
Alvaro Herrera alvhe...@commandprompt.com writes: Hmm, this needs some cleanup; the comments still talk about the old function name; and about just the recovery.conf file. Ah yes, thinking it's an easy patch is not helping. Please find attached a revised version of it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 55,60 --- 55,61 #include utils/guc.h #include utils/ps_status.h #include utils/relmapper.h + #include utils/cfparser.h #include pg_trace.h *** *** 5018,5117 str_time(pg_time_t tnow) } /* - * Parse one line from recovery.conf. 'cmdline' is the raw line from the - * file. If the line is parsed successfully, returns true, false indicates - * syntax error. On success, *key_p and *value_p are set to the parameter - * name and value on the line, respectively. If the line is an empty line, - * consisting entirely of whitespace and comments, function returns true - * and *keyp_p and *value_p are set to NULL. - * - * The pointers returned in *key_p and *value_p point to an internal buffer - * that is valid only until the next call of parseRecoveryCommandFile(). - */ - static bool - parseRecoveryCommandFileLine(char *cmdline, char **key_p, char **value_p) - { - char *ptr; - char *bufp; - char *key; - char *value; - static char *buf = NULL; - - *key_p = *value_p = NULL; - - /* - * Allocate the buffer on first use. It's used to hold both the parameter - * name and value. - */ - if (buf == NULL) - buf = malloc(MAXPGPATH + 1); - bufp = buf; - - /* Skip any whitespace at the beginning of line */ - for (ptr = cmdline; *ptr; ptr++) - { - if (!isspace((unsigned char) *ptr)) - break; - } - /* Ignore empty lines */ - if (*ptr == '\0' || *ptr == '#') - return true; - - /* Read the parameter name */ - key = bufp; - while (*ptr !isspace((unsigned char) *ptr) - *ptr != '=' *ptr != '\'') - *(bufp++) = *(ptr++); - *(bufp++) = '\0'; - - /* Skip to the beginning quote of the parameter value */ - ptr = strchr(ptr, '\''); - if (!ptr) - return false; - ptr++; - - /* Read the parameter value to *bufp. Collapse any '' escapes as we go. */ - value = bufp; - for (;;) - { - if (*ptr == '\'') - { - ptr++; - if (*ptr == '\'') - *(bufp++) = '\''; - else - { - /* end of parameter */ - *bufp = '\0'; - break; - } - } - else if (*ptr == '\0') - return false; /* unterminated quoted string */ - else - *(bufp++) = *ptr; - - ptr++; - } - *(bufp++) = '\0'; - - /* Check that there's no garbage after the value */ - while (*ptr) - { - if (*ptr == '#') - break; - if (!isspace((unsigned char) *ptr)) - return false; - ptr++; - } - - /* Success! */ - *key_p = key; - *value_p = value; - return true; - } - - /* * See if there is a recovery command file (recovery.conf), and if so * read in parameters for archive recovery and XLOG streaming. * --- 5019,5024 *** *** 5147,5153 readRecoveryCommandFile(void) char *tok1; char *tok2; ! if (!parseRecoveryCommandFileLine(cmdline, tok1, tok2)) { syntaxError = true; break; --- 5054,5060 char *tok1; char *tok2; ! if (!cfParseOneLine(cmdline, tok1, tok2)) { syntaxError = true; break; *** a/src/backend/utils/misc/Makefile --- b/src/backend/utils/misc/Makefile *** *** 15,21 include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \ !rbtree.o # This location might depend on the installation directories. Therefore # we can't subsitute it into pg_config.h. --- 15,21 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \ !rbtree.o cfparser.o # This location might depend on the installation directories. Therefore # we can't subsitute it into pg_config.h. *** /dev/null --- b/src/backend/utils/misc/cfparser.c *** *** 0 --- 1,114 + /*- + * + * cfparser.c + * Function for parsing RecoveryCommandFile and Extension Control files + * + * This very simple file format (line oriented, variable = 'value') is used + * as the recovery.conf and the extension control file format. + * + * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/backend/utils/misc/cfparser.c + * + *- + */ + + #include postgres.h + + /* + * Parse one line
Re: [HACKERS] Issues with two-server Synch Rep
Absolutely. For a synch standby, you can't tolerate any standby delay at all. This means that anywhere from 1/4 to 3/4 of queries on the standby would be cancelled on any high-traffic OLTP server. Hence, useless. Don't agree with your numbers there and you seem to be assuming no workarounds would be in use. A different discussion, I think. The only viable workaround would be to delay vacuum on the master, no? Adding the feedback channel looks trivial to me, once we've got the main sync rep patch in. I'll handle that. OK. I thought it would be a major challenge. Ideally, we'd have some way to turn snapshot publication on or off; for a standby which was not receiving reads, we wouldn't need it. Maybe make it contingent on the status of hot_standby GUC? That would make sense. For this reason, I've removed the apply mode from my patch, for now. I want to get the simplest possible patch agreed and then add things later. Um, ok. apply mode is still useful for users who are not running queries against the standby. Which many won't. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On 10/19/2010 12:21 AM, Andrew Dunstan wrote: On 10/18/2010 10:52 AM, Tom Lane wrote: We could possibly deal with enum types that follow the existing convention if we made the cache entry hold a list of all the original, known-to-be-sorted OIDs. (This could be reasonably compact and cheap to probe if it were represented as a starting OID and a Bitmapset of delta values, since we can assume that the initial set of OIDs is pretty close together.) But we have to have that cache entry, and we have to consult it on every single comparison, so it's definitely going to be slower than before. So I'm thinking the comparison procedure goes like this: 1. Both OIDs even? If so, just compare them numerically, and we're done. 2. Lookup cache entry for enum type. 3. Both OIDs in list of known-sorted OIDs? If so, just compare them numerically, and we're done. 4. Search the part of the cache entry that lists sort positions. If not both present, refresh the cache entry. If still not present, throw error. 5. Compare by sort positions. Step 4 is the slowest part but would be avoided in most cases. However, step 2 is none too speedy either, and would usually be required when dealing with pre-existing enums. OK, I've made adjustments that I think do what you're suggesting. I've discovered and fixed a couple more bugs in this. I have one or two more things to fix and then I'll send a new patch. Meanwhile, I've been testing a database that was upgraded from 9.0, so it has a lot of odd-numbered Oids. It's not really clear from performance testing that the bitmap is a huge win, or even a win at all. (Of course, my implementation might suck too.) I'll continue testing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Location for makeWholeRowVar()?
The cause of bug #5716 is that preprocess_targetlist is trying to get away with creating a whole-row variable marked with type RECORDOID, even in cases where a specific composite type is known for the referenced RTE. This fails because now we might have non-equal Vars representing the same RTE attribute in different places in the same query. I think the best fix for this is to have one and only one function generating whole-row Vars, which means pulling out the guts of the parser's transformWholeRowRef() function into a function that looks like the attached. Now I'm wondering where to put it. I'm tempted to put it into makefuncs.c next to makeVar(), and it fits there without needing to add any new #includes; but it looks a tad more complicated than most other functions in makefuncs.c. Does anyone think this is a bad place for it, and if so where would you put it instead? regards, tom lane /* * makeWholeRowVar - *creates a Var node representing a whole row from the specified RTE */ Var * makeWholeRowVar(RangeTblEntry *rte, Index varno, Index varlevelsup) { Var*result; Oid toid; switch (rte-rtekind) { case RTE_RELATION: /* relation: the rowtype is a named composite type */ toid = get_rel_type_id(rte-relid); if (!OidIsValid(toid)) elog(ERROR, could not find type OID for relation %u, rte-relid); result = makeVar(varno, InvalidAttrNumber, toid, -1, varlevelsup); break; case RTE_FUNCTION: toid = exprType(rte-funcexpr); if (type_is_rowtype(toid)) { /* func returns composite; same as relation case */ result = makeVar(varno, InvalidAttrNumber, toid, -1, varlevelsup); } else { /* * func returns scalar; instead of making a whole-row Var, * just reference the function's scalar output. (XXX this * seems a tad inconsistent, especially if f.* was * explicitly written ...) */ result = makeVar(varno, 1, toid, -1, varlevelsup); } break; case RTE_VALUES: toid = RECORDOID; /* returns composite; same as relation case */ result = makeVar(varno, InvalidAttrNumber, toid, -1, varlevelsup); break; default: /* * RTE is a join or subselect. We represent this as a whole-row * Var of RECORD type. (Note that in most cases the Var will be * expanded to a RowExpr during planning, but that is not our * concern here.) */ result = makeVar(varno, InvalidAttrNumber, RECORDOID, -1, varlevelsup); break; } return result; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist plans
On Sat, Oct 16, 2010 at 8:42 PM, Oleg Bartunov o...@sai.msu.su wrote: Marios, you're right. There are several reasons for poor documentation, but of course, no excuse, we do need good docs any way ! It's very nice you're willing to write one, since it's always better seen from outside of development. I think it'd be better to use wiki, so other people can join. Pictures would be nice, but I don't know if there is consensus about using pictures in documentation. Wiki sounds good. Where can I find it? Oleg btw, we're thinking in the background about new indexing infrastructure - Spatial GiST, but it's rather big project, so we need sponsors. Is it related to SP-GIST (http://www.cs.purdue.edu/spgist/)?
Re: [HACKERS] Timeline in the light of Synchronous replication
On Mon, Oct 18, 2010 at 4:31 AM, Fujii Masao masao.fu...@gmail.com wrote: But, even though we will have done that, it should be noted that WAL in A might be ahead of that in B. For example, A might crash right after writing WAL to the disk and before sending it to B. So when we restart the old master A as the standby after failover, we should need to delete some WAL files (in A) which are inconsistent with the WAL sequence in B. Right. There's no way to make it categorically safe to turn A into a standby, because there's no way to guarantee that the fsyncs of the WAL happen at the same femtosecond on both machines. What we should be looking for is a reliable way to determine whether or not it is in fact safe. Timelines are intended to provide that, but there are holes, so they don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] gist DatumGetPointer returns pointer to corrupted data
I index these structures in gist: typedef struct { uint8 type_flag; float8 xi; float8 yi; Timestamp ti; float8 xe; float8 ye; Timestamp te; int32 id; } typ_s_flagged; typedef struct { uint8 type_flag; float8 xl; float8 yl; Timestamp tl; float8 xh; float8 yh; Timestamp th; } typ_b_flagged; typ_s_flagged is the type of leaf entries and typ_b_flagged is for non-leaf entries. This is how I determine which type it is in functions union, picksplit, penalty etc (I tried to use GIST_LEAF but it produced errors in execution time!, anyway I know this might not be a best practice but it is not wrong). GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); //in penalty, consistent //or GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0); entry = entryvec-vector[i]; in union and picksplit uint8 *type_flag = (uint8 *) DatumGetPointer(entry-key); if (*type_flag == 0) { typ_s_flagged *p1 = (typ_s_flagged *) DatumGetPointer(entry-key); } else if(*type_flag == 1){ typ_b_flagged *p2 = (typ_b_flagged *) DatumGetPointer(entry-key); } The problem is that when I access *p1-te* or *p1-id* or *p2-th* the value I get is zeros, for both. But I get correct values for variables before * p1-te*. I checked my code multiple times and I didn't found a mistake like bad size in palloc or wrong variable assignment. I checked compress function and it seems to accept and return correct values. *Does anyone have any idea on how to solve this? Or why it happens? *
Re: [HACKERS] Simplifying replication
On Tue, Oct 19, 2010 at 9:16 AM, Josh Berkus j...@agliodbs.com wrote: Well, one thing to be addressed is separating the PITR functionality from replication. PITR needs a lot of features -- timelines, recovery stop points, etc. -- which replication doesn't need or want. I think that focussing on streaming replication functionality and ignoring the archive logs case is probably the best way to logically separate these two. Presumably anyone who needs archive logs as well will be a professional DBA. The way things stand you *always* need archived logs. Even if you have streaming set up it might try to use archived logs if it falls too far behind. Also all the features PITR needs are needed by replication as well. Recovery stop points are absolutely critical. Otherwise if your replica crashed it would have to start over from the original clone time and replay all logs since then. Timelines are not as obvious but perhaps that's our own mistake. When you fail over to your replica shouldn't the new master get a new timelineid? Isn't that the answer to the failure case when a slave finds it's ahead of the master? If it has already replayed logs from a different timelineid in the same lsn range then it can't switch timelines to follow the new master. But if it hasn't then it can. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Greg, The way things stand you *always* need archived logs. Even if you have streaming set up it might try to use archived logs if it falls too far behind. Actually, you don't. If you're willing to accept possible desynchronization and recloning of the standbys, then you can skip the archive logs. Timelines are not as obvious but perhaps that's our own mistake. When you fail over to your replica shouldn't the new master get a new timelineid? Isn't that the answer to the failure case when a slave finds it's ahead of the master? If it has already replayed logs from a different timelineid in the same lsn range then it can't switch timelines to follow the new master. But if it hasn't then it can. Oh? Do we have this information (i.e. what LSNs are associated with which timeline)? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Mon, Oct 18, 2010 at 6:05 PM, Greg Stark gsst...@mit.edu wrote: On Mon, Oct 18, 2010 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote: It'd be kinda cool if we had it, but the work required to get there seems far out of proportion to the benefits ... I agree. I think that's backing into the problem from the wrong end. The limiting factor here is that we require the entire cluster to be replicated. If you could replicate individual tables/schemas, then this problem disappears. Of course, that's not easy either, but if you're going to solve a really hard problem, you might as well pick that one. That seems like an orthogonal issue. You can have a replica with fewer objects -- which requires ignoring parts of the logs -- or more objects -- which requires being able to do local modifications and have local transaction states independent of the master. They're both potentially useful features and neither replaces the need for the other. Simon talked about filtering the transaction logs a while back and got a lot of pushback. But the more uses we put the slaves to the more it will come up. I figure we'll eventually do something for this though we might want more experience with the current setup before we dive into it. Adding extra objects in the slaves sounds massively harder. The idea of using temp tables might be a useful simplification because in the general case what you want is a separate set of xids with a separate clog and wal. That sounds like it would be a lot more complex. Well, temp tables really want a separate set of XIDs with a separate CLOG, too. Admittedly, they don't necessarily need WAL, if you can make them work without catalog entries, but that's not so easy either. Another possibility though is to use the MED stuff. If we could support creating tables locally that were actually hosted by a separate database then you could do updates, inserts, etc against those tables since they're actually happening in the remote database. Alternately you set up your slave to be the target of remote tables in a read-write data warehouse database. But this approach has limitations of its own until our MED implementation is a lot more powerful than it is today. Agreed on all points. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Tue, Oct 19, 2010 at 12:09 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Tom Lane t...@sss.pgh.pa.us writes: You could argue that either way I guess. The script knows what it needs, but OTOH just about every extension there is will probably be generating useless NOTICEs unless something is done, so maybe the extension management code should take care of it for them. Either way is the key here too, so please find attached a revised (v5) patch which will force log_min_messages and client_min_messages to WARNING while the script is run. It seems good to do this in the normal case, but (1) if client_min_messages was already set higher than WARNING, we probably should not lower it and (2) we might want to have a way to lower it for troubleshooting purposes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote: On 10/19/2010 09:06 AM, Greg Smith wrote: I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. I'd be in favor of just committing that change right now, before it gets forgotten about, and then if nobody else gets around to further work at least something improved here for 9.1. Heck, even *I* could write that patch, if we're agreed. Although you can commit it. Setting max_wal_senders to a non-zero value causes additional work to be done every time a transaction commits, aborts, or is prepared. It's possible that this overhead is too trivial to be worth worrying about; I haven't looked at it closely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Tue, Oct 19, 2010 at 02:52:01PM -0400, Robert Haas wrote: Well, temp tables really want a separate set of XIDs with a separate CLOG, too. Admittedly, they don't necessarily need WAL, if you can make them work without catalog entries, but that's not so easy either. At one point there was the idea to have a sort of permanent temporary tables which would have a pg_class entry but each session would have its own copy. Replicated slaves would then also be able to use this construction. Doesn't help with the XIDs though. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Tue, Oct 19, 2010 at 3:01 PM, Martijn van Oosterhout klep...@svana.org wrote: On Tue, Oct 19, 2010 at 02:52:01PM -0400, Robert Haas wrote: Well, temp tables really want a separate set of XIDs with a separate CLOG, too. Admittedly, they don't necessarily need WAL, if you can make them work without catalog entries, but that's not so easy either. At one point there was the idea to have a sort of permanent temporary tables which would have a pg_class entry but each session would have its own copy. Replicated slaves would then also be able to use this construction. Doesn't help with the XIDs though. Hmm... yeah, I think I was the one who proposed that, actually. :-) The trick is that it would require us to have two pg_class tables, two pg_attribute tables, two pg_attrdef tables, etc.: in each case, one permanent and one temporary. I am not sure how complex that will turn out to be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
On Mon, 2010-10-18 at 22:12 -0500, Kevin Grittner wrote: Hmmm... When Joe was looking at the patch he exposed an intermittent problem with btree indexes which turned out to be related to improper handling of the predicate locks during index page clean-up caused by a vacuum. Easy to fix once found, but it didn't always happen, even with identical runs. (I'm guessing that was due to the randomness in picking a page to split when inserting into a group of identical keys.) Perhaps a similar bug lurks in the GiST predicate locking. I briefly looked into this when I woke up this morning, and I think I'm close. I can reproduce it every time, so I should be able to fix this as soon as I can find some free time (tomorrow night, probably). I might also be able to help with the 2PC issue, but it will be at least a week or two before I have the free time to dig into that one. Eventually, we may need to keep statistics about the number of conflicts happening, and start to rate-limit new serializable transactions (effectively reducing parallelism) to ensure that reasonable progress can be made (hopefully faster than serial execution). Ah, you've exposed just how self-serving my interest in an admission control policy mechanism is! ;-) http://archives.postgresql.org/pgsql-hackers/2009-12/msg02189.php Cool! I also *really* hope to add the SERIALIZABLE READ ONLY DEFERRABLE mode so that pg_dump and other read-only transactions don't push things into a state where the rollback rate spikes: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01771.php One potential issue there is starvation. I don't see how you would guarantee that there will ever be a point to grab a safe snapshot, even if all of the other transactions are completing. After mulling it over in greater detail the previously, I see your point. I'll think about it some more, but this particular idea might be a dead end. I didn't quite mean that it's a dead-end. It still seems tempting to at least try to get a safe snapshot, especially for a transaction that you know will be long-running. Maybe a timeout or something? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote: Agreed. BSON was born out of implementations that either lacked arbitrary precision numbers or had a strong affinity to an int/floating point way of thinking about numbers. I believe that if BSON had an arbitrary precision number type, it would be a proper superset of JSON. As an aside, the max range of an int in BSON 64bits. Back to my original comment that BSON was grown instead of designed, it looks like both the 32bit and 64bit integers were added late in the game and that the original designers perhaps were just going to store all numbers as double. Perhaps we should enumerate the attributes of what would make a good binary encoding? I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. I am also wondering how this proposed binary encoding scheme will interact with TOAST. If the datum is compressed on disk, you'll have to decompress it anyway to do anything with it; at that point, is there still going to be a noticeable speed-up from using the binary encoding? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 12:09 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Either way is the key here too, so please find attached a revised (v5) patch which will force log_min_messages and client_min_messages to WARNING while the script is run. It seems good to do this in the normal case, but (1) if client_min_messages was already set higher than WARNING, we probably should not lower it and (2) we might want to have a way to lower it for troubleshooting purposes. I think the standard way of troubleshooting would be to run the extension's script by hand, no? So while I agree with (1), I'm not sure we need to sweat about (2). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
2010/10/19 Teodor Sigaev teo...@sigaev.ru: Thinking about it that way, perhaps we could add an integer column amop_whats_it_good_for that gets used as a bit field. That wouldn't require changing the index structure, although it might break some other things. OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH | ORDER } [, ...] ] It's very desirable thing to be able to distinguish roles in consistent method of GiST: computation of distance could be very expensive and. The single way to provide it in current GiST interface is a strategy number. Of course, we could add 6-th argument to consistent to point role, but I don't think that's good decision. To me, adding an additional argument (or maybe providing a whole separate support function, separate from consistent) seems quite natural, because now you have a way to pass all the other little bits that might matter... ASC/DESC, NULLS FIRST/LAST, collation OID, etc. You can define the additional argument as providing all of the extra info about how the operator is being used, and, if it's being used for ordering, the details of the requested order. What is your thinking on the matter? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with two-server Synch Rep
On Tue, 2010-10-19 at 09:36 -0700, Josh Berkus wrote: Absolutely. For a synch standby, you can't tolerate any standby delay at all. This means that anywhere from 1/4 to 3/4 of queries on the standby would be cancelled on any high-traffic OLTP server. Hence, useless. Don't agree with your numbers there and you seem to be assuming no workarounds would be in use. A different discussion, I think. The only viable workaround would be to delay vacuum on the master, no? Sure. It's a documented workaround. Adding the feedback channel looks trivial to me, once we've got the main sync rep patch in. I'll handle that. OK. I thought it would be a major challenge. Ideally, we'd have some way to turn snapshot publication on or off; for a standby which was not receiving reads, we wouldn't need it. Maybe make it contingent on the status of hot_standby GUC? That would make sense. Yes, I thought to extend hot_standby parameter to have 3 modes: off (default) | on | feedback For this reason, I've removed the apply mode from my patch, for now. I want to get the simplest possible patch agreed and then add things later. Um, ok. apply mode is still useful for users who are not running queries against the standby. Which many won't. I agree many people won't use the standby for reads. Why then would they care about the difference between fsync and apply? Anyway, that suggestion is mainly so we can reach agreement on a minimal patch, not suggesting it as a limit on released functionality. -- Simon Riggs http://www.2ndquadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gist DatumGetPointer returns pointer to corrupted data
On Tue, Oct 19, 2010 at 1:46 PM, Marios Vodas mvo...@gmail.com wrote: I index these structures in gist: typedef struct { uint8 type_flag; float8 xi; float8 yi; Timestamp ti; float8 xe; float8 ye; Timestamp te; int32 id; } typ_s_flagged; typedef struct { uint8 type_flag; float8 xl; float8 yl; Timestamp tl; float8 xh; float8 yh; Timestamp th; } typ_b_flagged; typ_s_flagged is the type of leaf entries and typ_b_flagged is for non-leaf entries. This is how I determine which type it is in functions union, picksplit, penalty etc (I tried to use GIST_LEAF but it produced errors in execution time!, anyway I know this might not be a best practice but it is not wrong). GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); //in penalty, consistent //or GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0); entry = entryvec-vector[i]; in union and picksplit uint8 *type_flag = (uint8 *) DatumGetPointer(entry-key); if (*type_flag == 0) { typ_s_flagged *p1 = (typ_s_flagged *) DatumGetPointer(entry-key); } else if(*type_flag == 1){ typ_b_flagged *p2 = (typ_b_flagged *) DatumGetPointer(entry-key); } The problem is that when I access p1-te or p1-id or p2-th the value I get is zeros, for both. But I get correct values for variables before p1-te. I checked my code multiple times and I didn't found a mistake like bad size in palloc or wrong variable assignment. I checked compress function and it seems to accept and return correct values. Does anyone have any idea on how to solve this? Or why it happens? Is pg_type.typlen set correctly? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote: On 10/19/2010 09:06 AM, Greg Smith wrote: I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. Setting max_wal_senders to a non-zero value causes additional work to be done every time a transaction commits, aborts, or is prepared. Yes. This isn't just a numeric parameter; it's also a boolean indicating do I want to pay the overhead to be prepared to be a replication master?. Josh has completely failed to make a case that that should be the default. In fact, the system would fail to start at all if we just changed the default for max_wal_senders and not the default for wal_level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Oct 19, 2010, at 12:17 PM, Robert Haas wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. Maybe someone has numbers on that for the XML type? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote: Perhaps we should enumerate the attributes of what would make a good binary encoding? Not sure if we're discussing the internal storage format or the binary send/recv format, but in my humble opinion, some attributes of a good internal format are: 1. Lightweight - it'd be really nice for the JSON datatype to be available in core (even if extra features like JSONPath aren't). 2. Efficiency - Retrieval and storage of JSON datums should be efficient. The internal format should probably closely resemble the binary send/recv format so there's a good reason to use it. A good attribute of the binary send/recv format would be compatibility. For instance, if MongoDB (which I know very little about) has binary send/receive, perhaps the JSON data type's binary send/receive should use it. Efficient retrieval and update of values in a large JSON tree would be cool, but would be rather complex, and IMHO, overkill. JSON's main advantage is that it's sort of a least common denominator of the type systems of many popular languages, making it easy to transfer information between them. Having hierarchical key/value store support would be pretty cool, but I don't think it's what PostgreSQL's JSON data type should do. On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. Speculation: the overhead involved with retrieving/sending and receiving/storing JSON (not to mention TOAST compression/decompression) will be far greater than that of serializing/unserializing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Hi, Boszormenyi Zoltan írta: There is one problem with the patch, it doesn't survive make check. One of the regression tests fails the Assert(!cur_em-em_is_child); line in process_equivalence() in equivclass.c, but I couldn't yet find it what causes it. The why is vaguely clear: something modifies the ec_members list in the eq_classes' tree nodes while the node is in the tree. Because I didn't find the offender yet, I couldn't fix it, so I send this patch as is. I'll try to fix it if someone doesn't beat me in fixing it. :) I am a little closer to this bug, maybe I even found the cause of it. I found that process_equivalence() is called from: path/equivclass.c: reconsider_outer_join_clause() reconsider_full_join_clause() plan/initsplan.c: distribute_qual_to_rels() The problem is with the two functions in path/equivclass.c, as process_equivalance() and those functions are all walk the tree, and the current RBTree code can only deal with one walk at a time. We need to push/pop the iterator state to be able to serve more than one walkers. Also, we need to split out the tree modifying part from process_equivalence() somehow, as the tree walking also cannot deal with node additions and deletions. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Excerpts from Dimitri Fontaine's message of mar oct 19 13:09:47 -0300 2010: Tom Lane t...@sss.pgh.pa.us writes: You could argue that either way I guess. The script knows what it needs, but OTOH just about every extension there is will probably be generating useless NOTICEs unless something is done, so maybe the extension management code should take care of it for them. Either way is the key here too, so please find attached a revised (v5) patch which will force log_min_messages and client_min_messages to WARNING while the script is run. I think you should pstrdup the return value from GetConfigOption. (As a matter of style, I'd name the local vars differently, so that they don't show up when you search the code for the global vars; perhaps old_cmsgs and old_lmsgs would do, for example.) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan z...@cybertec.at writes: The problem is with the two functions in path/equivclass.c, as process_equivalance() and those functions are all walk the tree, and the current RBTree code can only deal with one walk at a time. We need to push/pop the iterator state to be able to serve more than one walkers. Good luck with that --- the iteration state is embedded in the rbtree. Also, we need to split out the tree modifying part from process_equivalence() somehow, as the tree walking also cannot deal with node additions and deletions. That's not happening either. Maybe you need to think of some other data structure to use. Hashtable maybe? dynahash.c at least has reasonably well-defined limitations in this area. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Tue, Oct 19, 2010 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote: On 10/19/2010 09:06 AM, Greg Smith wrote: I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. Setting max_wal_senders to a non-zero value causes additional work to be done every time a transaction commits, aborts, or is prepared. Yes. This isn't just a numeric parameter; it's also a boolean indicating do I want to pay the overhead to be prepared to be a replication master?. Josh has completely failed to make a case that that should be the default. In fact, the system would fail to start at all if we just changed the default for max_wal_senders and not the default for wal_level. regards, tom lane If the variable is altered such that it is dynamic, could it not be updated by the postmaster when a connection attempts to begin replicating? -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist plans
On Tue, 19 Oct 2010, Marios Vodas wrote: On Sat, Oct 16, 2010 at 8:42 PM, Oleg Bartunov o...@sai.msu.su wrote: Marios, you're right. There are several reasons for poor documentation, but of course, no excuse, we do need good docs any way ! It's very nice you're willing to write one, since it's always better seen from outside of development. I think it'd be better to use wiki, so other people can join. Pictures would be nice, but I don't know if there is consensus about using pictures in documentation. Wiki sounds good. Where can I find it? Lets start from this page. http://wiki.postgresql.org/wiki/GiST I think this page should be index page for all GiST documentation. Oleg btw, we're thinking in the background about new indexing infrastructure - Spatial GiST, but it's rather big project, so we need sponsors. Is it related to SP-GIST (http://www.cs.purdue.edu/spgist/)? sort of. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Tue, Oct 19, 2010 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote: On 10/19/2010 09:06 AM, Greg Smith wrote: I think Magnus's idea to bump the default to 5 triages the worst of the annoyance here, without dropping the feature (which has uses) or waiting for new development to complete. Setting max_wal_senders to a non-zero value causes additional work to be done every time a transaction commits, aborts, or is prepared. Yes. This isn't just a numeric parameter; it's also a boolean indicating do I want to pay the overhead to be prepared to be a replication master?. Josh has completely failed to make a case that that should be the default. In fact, the system would fail to start at all if we just changed the default for max_wal_senders and not the default for wal_level. On a slightly tangential note, it would be really nice to be able to change things like wal_level and max_wal_senders on the fly. ISTM that needing to change the setting is actually the smaller portion of the gripe; the bigger annoyance is that you bring the system down, change one setting, bring it back up, take a hot backup, and then realize that you have to shut the system down again to change the other setting, because you forgot to do it the first time. Since the error message you get on the slave side is pretty explicit, the sheer fact of needing to change max_wal_senders is only a minor inconvenience; but the possible need to take down the system a second time is a major one. One of the problems with making these and similar settings changeable on the fly is that, to be safe, we can declare that the value is increased until we can verify that every still-running backend has picked up the increased value and begun acting upon it. We have no architecture for knowing when that has happened; the postmaster just signals its children and forgets about it. If you had a shared memory array (similar to the ProcArray, but perhaps with a separate set of locks) in which backends wrote their last-known values of GUCs that fall into this category, it would be possible to write a function that returns the answer to the question What is the least value of wal_level that exists in any backend in the system?. Maybe that's not the right architecture; I'm not sure. But we should see if we can figure out something that will work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 3:40 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. Speculation: the overhead involved with retrieving/sending and receiving/storing JSON (not to mention TOAST compression/decompression) will be far greater than that of serializing/unserializing. I speculate that your speculation is incorrect. AIUI, we, unlike $COMPETITOR, tend to be CPU-bound rather than IO-bound on COPY. But perhaps less speculation and more benchmarking is in order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Yes. This isn't just a numeric parameter; it's also a boolean indicating do I want to pay the overhead to be prepared to be a replication master?. Since this is the first time I've heard of the overhead, it would be hard for me to have taken that into consideration. If there was discussion about this ever, I missed it. That explains why we changed the default in RC1 though, which was a surprise to me. What is the overhead exactly? What specific work do we do? Link? Josh has completely failed to make a case that that should be the default. In fact, the system would fail to start at all if we just changed the default for max_wal_senders and not the default for wal_level. Well, now that you mention it, I also think that hot standby should be the default. Yes, I know about the overhead, but I also think that the number of our users who want easy replication *far* outnumber the users who care about an extra 10% WAL overhead. On a slightly tangential note, it would be really nice to be able to change things like wal_level and max_wal_senders on the fly. This would certainly help solve the problem. Heck, if we could even just change them with a reload (rather than a restart) it would be a vast improvement. ISTM that needing to change the setting is actually the smaller portion of the gripe; the bigger annoyance is that you bring the system down, change one setting, bring it back up, take a hot backup, and then realize that you have to shut the system down again to change the other setting, because you forgot to do it the first time. Since the error message you get on the slave side is pretty explicit, the sheer fact of needing to change max_wal_senders is only a minor inconvenience; but the possible need to take down the system a second time is a major one. You've summed up the problem nicely. I'll note that even though I've already set up 3 production systems using SR, I still mess this up about 1/3 the time and have to restart and reclone. There's just too many settings. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote: The trick is that it would require us to have two pg_class tables, two pg_attribute tables, two pg_attrdef tables, etc.: in each case, one permanent and one temporary. I am not sure how complex that will turn out to be. Tom suggested using inheritance for this. I find it strange to try constructing catalog tables to represent these local definitions which never need to be read by any other backend and in any case are 1:1 copies of the global catalog entries. It seems to me simpler and more direct to just nail relcache entries for these objects into memory and manipulate them directly. They can be constructed from the global catalog tables and then tweaked to point to the backend local temporary tables. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gist DatumGetPointer returns pointer to corrupted data
On Tue, Oct 19, 2010 at 10:23 PM, Robert Haas robertmh...@gmail.com wrote: Is pg_type.typlen set correctly? Are you refering to table pg_type? If yes, those type structures exist only in c I didn't write any in-out functions, so they don't exist in sql level. I pass a different data type (which exists in sql) to compress method which changes it to typ_s_flagged. The problem occurs in C code. I hope my explanation is clear.
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 2:46 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 19, 2010 at 3:40 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. Speculation: the overhead involved with retrieving/sending and receiving/storing JSON (not to mention TOAST compression/decompression) will be far greater than that of serializing/unserializing. I speculate that your speculation is incorrect. AIUI, we, unlike $COMPETITOR, tend to be CPU-bound rather than IO-bound on COPY. But perhaps less speculation and more benchmarking is in order. After spending a week in the morass of this, I have to say that I am less certain than I was on any front regarding the text/binary distinction. I'll take some time and benchmark different cases. My hypothesis is that a well implemented binary structure and conversions will add minimal overhead in the IO + Validate case which would be the typical in/out flow. It could be substantially faster for binary send/receive because the validation step could be eliminated/reduced. Further storing as binary reduces the overhead of random access to the data by database functions. I'm envisioning staging this up as follows: 1. Create a jsontext. jsontext uses text as its internal representation. in/out functions are essentially a straight copy or a copy + validate. 2. Create a jsonbinary type. This uses an optimized binary format for internal rep and send/receive. in/out is a parse/transcode operation to standard JSON text. 3. Internal data access functions and JSON Path require a jsonbinary. 4. There are implicit casts to/from jsontext and jsonbinary. I've got a grammar in mind for the binary structure that I'll share later when I've got some more time. It's inspired by $COMPETITOR's format but a little more sane, using type tags that implicitly define the size of the operands, simplifying parsing. I'll then define the various use cases and benchmark using the different types. Some examples include such as IO No Validate, IO+Validate, Store and Index, Internal Processing, Internal Composition, etc. The answer may be to have both a jsontext and jsonbinary type as each will be optimized for a different case. Make sense? It may be a week before I get through this. Terry
[HACKERS] pg_rawdump
I spent yesterday writing a new tool pg_rawdump (which will be released as open source in due course), which takes the table files in an arbitrary pgsql database, and is able to transform those back into tables (including toast values). In the course of doing this (a customer needed it because he only had a copy of those files, the pg_clog etc. dirs were lost), I noticed that there are two problems which complicate recovery (needlessly): a. It takes some guesswork to find out which file corresponds with which table. b. In order to recover the table content, it is (obviously) necessary to correlate with the original table definition, which obviously is lost with the rest of the information. In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? I'd be willing to create a patch for that (should be pretty easy), if nobody considers it to be a bad idea. -- Stephen. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
2010/10/19 Greg Stark gsst...@mit.edu: On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote: The trick is that it would require us to have two pg_class tables, two pg_attribute tables, two pg_attrdef tables, etc.: in each case, one permanent and one temporary. I am not sure how complex that will turn out to be. Tom suggested using inheritance for this. I find it strange to try constructing catalog tables to represent these local definitions which never need to be read by any other backend and in any case are 1:1 copies of the global catalog entries. It seems to me simpler and more direct to just nail relcache entries for these objects into memory and manipulate them directly. They can be constructed from the global catalog tables and then tweaked to point to the backend local temporary tables. +1 I had very ugly implementation of global temp tables based just on relcache. The only one problem was with refresh of relcache. But it's not too easy - for real using it's necessary to overwrite - statistics, indexes, access statistics. I had a idea to modify a data pages cache for support a permanent (and only memory) pages. Then we can have a temporal tuples together with standard tuples in one system table. This can be similar to memory tables in mysql and can be interesting in cooperation with mmap - very fast access to some tables or pre readed tables. Regards Pavel -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. The elephant in the room is if the binary encoded form is smaller then it occupies less ram and disk bandwidth to copy it around. If your database is large that alone is the dominant factor. Doubling the size of all the objects in your database means halving the portion of the database that fits in RAM and doubling the amount of I/O required to complete any given operation. If your database fits entirely in RAM either way then it still means less RAM bandwidth used which is often the limiting factor but depending on how much cpu effort it takes to serialize and deserialize the balance could shift either way. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
After spending a week in the morass of this, I have to say that I am less certain than I was on any front regarding the text/binary distinction. I'll take some time and benchmark different cases. My hypothesis is that a well implemented binary structure and conversions will add minimal overhead in the IO + Validate case which would be the typical in/out flow. It could be substantially faster for binary send/receive because the validation step could be eliminated/reduced. Further storing as binary reduces the overhead of random access to the data by database functions. I'm envisioning staging this up as follows: 1. Create a jsontext. jsontext uses text as its internal representation. in/out functions are essentially a straight copy or a copy + validate. 2. Create a jsonbinary type. This uses an optimized binary format for internal rep and send/receive. in/out is a parse/transcode operation to standard JSON text. 3. Internal data access functions and JSON Path require a jsonbinary. 4. There are implicit casts to/from jsontext and jsonbinary. some years ago I solved similar problems with xml type. I think, so you have to calculate with two factors: a) all varlena types are compressed - you cannot to get some interesting fragment or you cannot tu update some interesting fragment, every time pg working with complete document b) access to some fragment of JSON or XML document are not really important, because fast access to data are solved via indexes. c) only a few API allows binary communication between server/client. Almost all interfases use only text based API. I see some possible interesting direction for binary protocol when some one uses a javascript driver, when some one use pg in some javascript server side environment, but it isn't a often used now. Regards Pavel I've got a grammar in mind for the binary structure that I'll share later when I've got some more time. It's inspired by $COMPETITOR's format but a little more sane, using type tags that implicitly define the size of the operands, simplifying parsing. I'll then define the various use cases and benchmark using the different types. Some examples include such as IO No Validate, IO+Validate, Store and Index, Internal Processing, Internal Composition, etc. The answer may be to have both a jsontext and jsonbinary type as each will be optimized for a different case. Make sense? It may be a week before I get through this. Terry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl wrote: In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? I'd be willing to create a patch for that (should be pretty easy), if nobody considers it to be a bad idea. There isn't necessarily one value for these attributes. You can rename columns and that rename may succeed and commit or fail and rollback. You can drop or add columns and some rows will have or not have the added columns at all. You could even add a column, insert some rows, then abort -- all in a transaction. So some (aborted) rows will have extra columns that aren't even present in the current table definition. All this isn't to say the idea you're presenting is impossible or a bad idea. If this meta information was only a hint for forensic purposes and you take into account these caveats it might still be useful. But I'm not sure how useful. I mean, you can't really decipher everything properly without the data in the catalog -- and you have to premise this on the idea that you've lost everything in the catalog but not the data in other tables. Which seems like a narrow use case. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
2010/10/19 Greg Stark gsst...@mit.edu: On Tue, Oct 19, 2010 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. The elephant in the room is if the binary encoded form is smaller then it occupies less ram and disk bandwidth to copy it around. If your database is large that alone is the dominant factor. Doubling the size of all the objects in your database means halving the portion of the database that fits in RAM and doubling the amount of I/O required to complete any given operation. If your database fits entirely in RAM either way then it still means less RAM bandwidth used which is often the limiting factor but depending on how much cpu effort it takes to serialize and deserialize the balance could shift either way. I am not sure, if this argument is important for json. This protocol has not big overhead and json documents are pretty small. More - from 9.0 TOAST uses a relative aggresive compression. I would to like a some standardised format for json inside pg too, but without using a some external library I don't see a advantages to use a other format then text. Regards Pavel -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Greg Stark wrote: On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl wrote: In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? I'd be willing to create a patch for that (should be pretty easy), if nobody considers it to be a bad idea. There isn't necessarily one value for these attributes. You can rename columns and that rename may succeed and commit or fail and rollback. You can drop or add columns and some rows will have or not have the added columns at all. You could even add a column, insert some rows, then abort -- all in a transaction. So some (aborted) rows will have extra columns that aren't even present in the current table definition. True. All this isn't to say the idea you're presenting is impossible or a bad idea. If this meta information was only a hint for forensic purposes and you take into account these caveats it might still be useful. This is exactly what I meant it for. It would contain the most recent alter table state (give or take some delay due to cache flushes). But I'm not sure how useful. I mean, you can't really decipher everything properly without the data in the catalog -- and you have to premise this on the idea that you've lost everything in the catalog but not the data in other tables. Which seems like a narrow use case. It happens, more often than you'd think. My client had it, I've seen numerous google hits which show the same. -- Stephen. Be braver. You cannot cross a chasm in two small jumps. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Domains versus arrays versus typmods
In bug #5717, Richard Huxton complains that a domain declared like CREATE DOMAIN mynums numeric(4,2)[1]; doesn't work properly, ie, the typmod isn't enforced in places where it reasonably ought to be. I dug into this a bit, and found that there are more worms in this can than I first expected. In the first place, plpgsql seems a few bricks shy of a load, in that its code to handle assignment to an array element doesn't appear to be considering arrays with typmod at all: it just passes typmod -1 to exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in exec_assign_value(). Nonetheless, if you try a case like declare x numeric(4,2)[1]; begin x[1] := $1; you'll find the typmod *is* enforced. It turns out that the reason that it works is that after constructing the modified array value, exec_assign_value() calls itself recursively to do the actual assignment to the array variable --- and that call sees that the array variable has a typmod, so it applies the cast *at the array level*, ie it disassembles the array, re-coerces each element, and builds a new array. So it's horridly inefficient but it works. That could and should be improved, but it seems to be just a matter of local changes in pl_exec.c, and it's only an efficiency hack anyway. The big problem is that none of this happens when the variable is declared as a domain, and I believe that that is a significantly more wide-ranging issue. The real issue as I see it is that it's possible to subscript an array without realizing that the array's type is really a domain that incorporates a typmod specification. This happens because of a hack we did to simplify implementation of domains over arrays: we expose the array element type as typelem of the domain as well. For example, given Richard's declaration we have regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums'; typname | oid | typelem | typbasetype | typtypmod -+---+-+-+--- mynums | 92960 |1700 |1231 |262150 (1 row) If we were to wrap another domain around that, we'd have regression=# create domain mynums2 as mynums; CREATE DOMAIN regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums2'; typname | oid | typelem | typbasetype | typtypmod -+---+-+-+--- mynums2 | 92976 |1700 | 92960 |-1 (1 row) and at this point it's completely unobvious from inspection of the pg_type entry that any typmod needs to be applied when subscripting. So I'm suspicious that there are a boatload of bugs related to this kind of domain declaration, not just the one case in plpgsql. I think that what we ought to do about it is to stop exposing typelem in domains' pg_type rows. If you want to subscript a domain value, you should have to drill down to its base type with getBaseTypeAndTypmod, which would also give you the correct typmod to apply. If we set typelem to zero in domain pg_type rows, it shouldn't take too long to find any places that are missing this consideration --- the breakage will be obvious rather than subtle. This catalog definitional change obviously shouldn't be back-patched, but possibly the ensuing code changes could be, since the typelem change is just to expose where things are wrong and wouldn't be a prerequisite for corrected code to behave correctly. Or we could decide that this is a corner case we're not going to try to fix in back branches. It's been wrong since day 0, so there's certainly an argument that it's not important enough to risk back-patching. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Jeff Davis pg...@j-davis.com wrote: I briefly looked into this when I woke up this morning, and I think I'm close. I can reproduce it every time, so I should be able to fix this as soon as I can find some free time (tomorrow night, probably). OK, I'll focus on other areas. I might also be able to help with the 2PC issue, but it will be at least a week or two before I have the free time to dig into that one. If I get through the other points you raised, I'll start reading up on 2PC. I didn't quite mean that it's a dead-end. It still seems tempting to at least try to get a safe snapshot, especially for a transaction that you know will be long-running. Maybe a timeout or something? But what would you do when you hit the timeout? One thing that would work, but I really don't think I like it, is that a request for a snapshot for such a transaction would not only block until it could get a clean snapshot (no overlapping serializable non-read-only transactions which overlap serializable transactions which wrote data and then committed in time to be visible to the snapshot being acquired), but it would *also* block *other* serializable transactions, if they were non-read-only, on an attempt to acquire a snapshot. That would at least guarantee that the serializable read only deferrable transaction could get its snapshot as soon as the initial set of problem overlapping transactions completed, but it would be an exception to the SSI introduces no new blocking guarantee. :-( I was OK with that for the particular transaction where DEFERRABLE was requested, but to have that block other serializable transactions seems pretty iffy to me. Short of that, I think you would just have to wait for completion of all known problem transactions and then try again. On a server with a heave write load, particularly if the length of the writing transactions was highly variable, you could go through a lot of iterations before getting that clean snapshot. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Greg Stark gsst...@mit.edu writes: On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote: The trick is that it would require us to have two pg_class tables, two pg_attribute tables, two pg_attrdef tables, etc.: in each case, one permanent and one temporary. I am not sure how complex that will turn out to be. Tom suggested using inheritance for this. I find it strange to try constructing catalog tables to represent these local definitions which never need to be read by any other backend and in any case are 1:1 copies of the global catalog entries. It seems to me simpler and more direct to just nail relcache entries for these objects into memory and manipulate them directly. Relcache entries alone are not gonna work. There is way too much stuff that assumes that tables are correctly represented in the system catalogs. It's possible that you could make it work if you created the child catalogs and immediately filled them with suitable entries describing the child catalogs themselves. Bootstrapping that might be a bit of fun though. The larger issue in all this is that there's so much code that supposes that it just has to scan a particular catalog when it wants an entry, and isn't going to go looking for child tables of the catalog. That's possibly fixable but is not likely to be easy, unless you can somehow hide it within systable_beginscan and related routines. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Terry Laurenzo t...@laurenzo.org writes: After spending a week in the morass of this, I have to say that I am less certain than I was on any front regarding the text/binary distinction. I'll take some time and benchmark different cases. My hypothesis is that a well implemented binary structure and conversions will add minimal overhead in the IO + Validate case which would be the typical in/out flow. It could be substantially faster for binary send/receive because the validation step could be eliminated/reduced. I think that arguments proceeding from speed of binary send/receive aren't worth the electrons they're written on, because there is nothing anywhere that says what the binary format ought to be. In the case of XML we're just using the text representation as the binary format too, and nobody's complained about that. If we were to choose to stick with straight text internally for a JSON type, we'd do the same thing, and again nobody would complain. So, if you want to make a case for using some binary internal format or other, make it without that consideration. I'm envisioning staging this up as follows: 1. Create a jsontext. jsontext uses text as its internal representation. in/out functions are essentially a straight copy or a copy + validate. 2. Create a jsonbinary type. This uses an optimized binary format for internal rep and send/receive. in/out is a parse/transcode operation to standard JSON text. Ugh. Please don't. JSON should be JSON, and nothing else. Do you see any other datatypes in Postgres that expose such internal considerations? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Stephen R. van den Berg s...@cuci.nl writes: In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? (1) it wouldn't necessarily fit (2) what are you going to do to maintain it during ALTER TABLE? (3) if there are any custom types involved, you're still lost. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Greg Stark gsst...@mit.edu writes: The elephant in the room is if the binary encoded form is smaller then it occupies less ram and disk bandwidth to copy it around. It seems equally likely that a binary-encoded form could be larger than the text form (that's often true for our other datatypes). Again, this is an argument that would require experimental evidence to back it up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Tue, Oct 19, 2010 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Relcache entries alone are not gonna work. There is way too much stuff that assumes that tables are correctly represented in the system catalogs. Well we're talking about multiple things now. In the global temporary table case they *are* properly represented in the system catalogs. Except for their local state such as the actual relfilenode all the structural attributes are going to be accurate. In the case of tables created locally on a slave, well, that's more complicated. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Terry Laurenzo t...@laurenzo.org writes: After spending a week in the morass of this, I have to say that I am less certain than I was on any front regarding the text/binary distinction. I'll take some time and benchmark different cases. My hypothesis is that a well implemented binary structure and conversions will add minimal overhead in the IO + Validate case which would be the typical in/out flow. It could be substantially faster for binary send/receive because the validation step could be eliminated/reduced. I think that arguments proceeding from speed of binary send/receive aren't worth the electrons they're written on, because there is nothing anywhere that says what the binary format ought to be. In the case of XML we're just using the text representation as the binary format too, and nobody's complained about that. If we were to choose to stick with straight text internally for a JSON type, we'd do the same thing, and again nobody would complain. So, if you want to make a case for using some binary internal format or other, make it without that consideration. I'm envisioning staging this up as follows: 1. Create a jsontext. jsontext uses text as its internal representation. in/out functions are essentially a straight copy or a copy + validate. 2. Create a jsonbinary type. This uses an optimized binary format for internal rep and send/receive. in/out is a parse/transcode operation to standard JSON text. Ugh. Please don't. JSON should be JSON, and nothing else. Do you see any other datatypes in Postgres that expose such internal considerations? regards, tom lane I don't think anyone here was really presenting arguments as yet. We're several layers deep on speculation and everyone is saying that experimentation is needed. I've got my own reasons for going down this path for a solution I have in mind. I had thought that some part of that might have been applicable to pg core, but if not, that's no problem. For my own edification, I'm going to proceed down this path and see where it leads. I'll let the list know what I find out. I can understand the sentiment that JSON should be JSON and nothing else from a traditional database server's point of view, but there is nothing sacrosanct about it in the broader context. Terry
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Greg Stark gsst...@mit.edu writes: On Tue, Oct 19, 2010 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Relcache entries alone are not gonna work. There is way too much stuff that assumes that tables are correctly represented in the system catalogs. Well we're talking about multiple things now. In the global temporary table case they *are* properly represented in the system catalogs. Except for their local state such as the actual relfilenode all the structural attributes are going to be accurate. ... and relpages and reltuples ... it's really not going to be that easy to have a table that isn't described in pg_class. Which the structure you're describing isn't. There might be a template for it in pg_class, but that's something entirely different. In the case of tables created locally on a slave, well, that's more complicated. I think they're more alike than you think. If we had the infrastructure to do local temp tables this way, it'd be pretty easy to use that to instantiate per-backend copies of global temp tables. (The global entities would be templates, not actual tables.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers