Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
On 2014-06-12 00:38:36 -0400, Noah Misch wrote: On Tue, Apr 15, 2014 at 03:28:41PM -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-03-27 08:02:35 -0400, Tom Lane wrote: Buildfarm member prairiedog thinks there's something unreliable about commit f01d1ae3a104019d6d68aeff85c4816a275130b3: So I had made a notice to recheck on this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD indicates there haven't been any further failures... So, for now I assume this was caused by some problem fixed elsewhere. Hard to say. In any case, I agree we can't make any progress unless we see it again. The improved test just tripped: Hrmpf. Just one of these days I was happy thinking it was gone... http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07 Hm. My guess it's that it's just a 'harmless' concurrency issue. The test currently run in concurrency with others: I think what happens is that the table gets dropped in the other relation after the query has acquired the mvcc snapshot (used for the pg_class) test. But why is it triggering on such a 'unusual' system and not on others? That's what worries me a bit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
Andres Freund and...@2ndquadrant.com writes: On 2014-06-12 00:38:36 -0400, Noah Misch wrote: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07 Hm. My guess it's that it's just a 'harmless' concurrency issue. The test currently run in concurrency with others: I think what happens is that the table gets dropped in the other relation after the query has acquired the mvcc snapshot (used for the pg_class) test. But why is it triggering on such a 'unusual' system and not on others? That's what worries me a bit. prairiedog is pretty damn slow by modern standards. OTOH, I think it is not the slowest machine in the buildfarm; hamster for instance seems to be at least a factor of 2 slower. So I'm not sure whether to believe it's just a timing issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Few observations in replication slots related code
Hi Amit, On 2014-06-12 08:55:59 +0530, Amit Kapila wrote: 1. In function StartupReplicationSlots(XLogRecPtr checkPointRedo), parameter checkPointRedo is not used. It used to be included in a debug message. Apparently the message was removed at some point (don't remember it, but I have a memory like a sieve). 2. Few check are in different order in functions pg_create_physical_replication_slot() and pg_create_logical_replication_slot(). if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) elog(ERROR, return type must be a row type); check_permissions(); CheckLogicalDecodingRequirements(); I don't think it makes any difference, however having checks in similar order seems better unless there is a reason for not doing so. Can change it. 3. Currently there is any Assert (Assert(!MyReplicationSlot);) in pg_create_logical_replication_slot(), is there a need for similar Assert in pg_create_physical_replication_slot()? Hm. Shouldn't really matter either way these days, but I guess it doesn't hurt to add one. 4. StartupDecodingContext() { .. context = AllocSetContextCreate(CurrentMemoryContext, Changeset Extraction Context, } To be consistent, shall we name this context as logical | logical decoding? Heh. That apparently escaped when things were renamed. Yes. 5. pg_create_logical_replication_slot() { .. CreateInitDecodingContext() .. ReplicationSlotPersist() } Function pg_create_logical_replication_slot() is trying to save slot twice once during CreateInitDecodingContext() and then in ReplicationSlotPersist(), isn't it better if we can make it do it just once? Doesn't work here. In the first save it's not yet marked as persistent - but we still need to safely reserve the xmin. It's also not something that should happen very frequently, so I'm not worried about the overhead. 6. elog(ERROR, cannot handle changeset extraction yet); Shouldn't it be better to use logical replication instead of changeset extraction? Will change. 7. + * XXX: It might make sense to introduce ephemeral slots and always use + * the slot mechanism. Already there are RS_EPHEMERAL slots, might be this comment needs bit of rephrasing. 8. Is there a chance of inconsistency, if pg_restxlog resets the xlog and after restart, one of the slots contains xlog position older than what is resetted by pg_resetxlog? Yes. There's lots of ways to screw over your database by using pg_resetxlog. I personally think there should be a required --yes-i-am-breaking-my-database parameter for pg_resetxlog. 9. void LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin) { .. /* * don't overwrite if we already have a newer xmin. This can happen if we * restart decoding in a slot. */ if (TransactionIdPrecedesOrEquals(xmin, slot-data.catalog_xmin)) { } .. } Should we just release spinlock in this loop and just return, rather than keeping no action loop? Don't think that'd make it any faster/simpler. We'd be stuck with duplicate codepaths. 10. * Iff ri_type = REPLICA_IDENTITY_INDEX, indexOid must be the Oid of a suitable * index. Otherwise, it should be InvalidOid. */ static void relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid, bool is_internal) typo - *Iff* iff := if and only if. Thanks for the look. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shared memory changes in 9.4?
[redirecting to -hackers] Re: Robert Haas 2014-05-28 CA+TgmoaTcAd48zW3auWzGdHi_V+QwA5HVCTabqgTq=zlwpy...@mail.gmail.com On Tue, May 27, 2014 at 8:22 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Mon, May 26, 2014 at 12:24 AM, Andres Freund and...@2ndquadrant.com wrote: Any chance you're using a 9.3 configuration file instead of the one generated by initdb? dynamic_shared_memory_type defaults to 'posix' if not specified in the config file (on platforms supporting it). If initdb detects that 'posix' can't be used it'll emit a different value. If you're copying the config from 9.3 and your environment doesn't support posix shm that'll cause the above error. I still think dynamic_shared_memory_type should default to 'none' because of such problems It works with 'none' and 'sysv'--I think the issue is that technically our environment does support 'posix', but '/dev/shm' is indeed not mounted in the LXC container, leading to a discrepancy between what initdb decides and what's actually possible. Thanks for your help. I think it would be good to understand why initdb isn't getting this right. Did you run initdb outside the LXC container, where /dev/shm would have worked, but then run postgres inside the LXC container, where /dev/shm does not work? I ask because initdb is supposed to be doing the same thing that postgres does, so it really ought to come to the same conclusion about what will and won't work. With regard to Andres' proposal, I'm not that keen on setting dynamic_shared_memory_type='none' by default. Would we leave it that way until we get in-core users of the facility, and then change it? I guess that'd be OK, but frankly if enabling dynamic_shared_memory_type by default is causing us many problems, then we'd better reconsider the design of the facility now, before we start adding more dependencies on it. We've already fixed a bunch of DSM-related issues as a result of the fact that the default *isn't* none, and I dunno how many of those we would have found if the default had been none. I tend to think DSM is an important facility that we're going to be wanting to build on in future releases, so I'm keen to have it available by default so that we can iron out any kinks before we get too far down that path. Hi, I've just run into the same problem. I am running the Debian postgresql-common testsuite, which includes upgrade tests. On upgrades, the old postgresql.conf is copied to the new server (after initdb and/or pg_upgrade), and deprecated options are removed/renamed. [*] In that chroot environment, 9.4 is running fine, except that upgrades failed because /dev/shm wasn't mounted, and the old 9.3 postgresql.conf doesn't contain dynamic_shared_memory_type = 'sysv'. To me, the following should be done: * Make initdb determine the best shm type for this platform and write it into postgresql.conf as it does now. * If no dynamic_shared_memory_type is found in the config, default to none. * Modify the three identical error messages concerned about shm segments to include the shm type instead of always just saying FATAL: could not open shared memory segment * Add a HINT to the POSIX error message: HINT: This might indicate that /dev/shm is not mounted, or its permissions do not allow the database user to create files there Despite /dev/shm having been around for quite some time, many people seem to be unaware what POSIX shared memory is, so the HINT is really needed there. It certainly took me weeks after seeing the error message the first time till I found the time to dig deeper to the issure - it should just have been oh yes, /dev/shm isn't mounted there, that's why. Christoph [*] This might not be the smartest thing to do, but it's a simple default approach to get the new cluster running with as much user config from the old config as possible. -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shared memory changes in 9.4?
Hi, On 2014-06-12 11:07:31 +0200, Christoph Berg wrote: Re: Robert Haas 2014-05-28 CA+TgmoaTcAd48zW3auWzGdHi_V+QwA5HVCTabqgTq=zlwpy...@mail.gmail.com On Tue, May 27, 2014 at 8:22 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Mon, May 26, 2014 at 12:24 AM, Andres Freund and...@2ndquadrant.com wrote: Any chance you're using a 9.3 configuration file instead of the one generated by initdb? dynamic_shared_memory_type defaults to 'posix' if not specified in the config file (on platforms supporting it). If initdb detects that 'posix' can't be used it'll emit a different value. If you're copying the config from 9.3 and your environment doesn't support posix shm that'll cause the above error. I still think dynamic_shared_memory_type should default to 'none' because of such problems With regard to Andres' proposal, I'm not that keen on setting dynamic_shared_memory_type='none' by default. Note that I'm not proposing to disable the whole thing. Just that a unset dynamic_shared_memory_type doesn't configure dsm. Initdb would still configure it after probing. To me, the following should be done: * Make initdb determine the best shm type for this platform and write it into postgresql.conf as it does now. * If no dynamic_shared_memory_type is found in the config, default to none. * Modify the three identical error messages concerned about shm segments to include the shm type instead of always just saying FATAL: could not open shared memory segment * Add a HINT to the POSIX error message: HINT: This might indicate that /dev/shm is not mounted, or its permissions do not allow the database user to create files there Sounds like a sane plan to me. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On Wed, Jun 11, 2014 at 2:39 AM, Tom Lane wrote: I'm not even 100% sold that automatically returning the primary key is going to save any application logic. Could somebody point out *exactly* where an app is going to save effort with this type of syntax, compared to requesting the columns it wants by name? I haven't checked the code, but I am hoping it will help with the problem where a RETURNING * is added to a statement that is not an insert or update by the JDBC driver. That has been reported on the JDBC list at least twice, and the proposed workaround is neither very elegant nor very robust: https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ Jochem -- Jochem van Dieten http://jochem.vandieten.net/
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 14/06/12 18:46, Jochem van Dieten wrote: On Wed, Jun 11, 2014 at 2:39 AM, Tom Lane wrote: I'm not even 100% sold that automatically returning the primary key is going to save any application logic. Could somebody point out *exactly* where an app is going to save effort with this type of syntax, compared to requesting the columns it wants by name? I haven't checked the code, but I am hoping it will help with the problem where a RETURNING * is added to a statement that is not an insert or update by the JDBC driver. That has been reported on the JDBC list at least twice, and the proposed workaround is neither very elegant nor very robust: https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ Unfortunately that seems to be a JDBC-specific issue, which is outside of the scope of this particular patch (which proposes additional server-side syntax intended to make RETURNING * operations more efficient for certain use cases, but which is in itself not a JDBC change). Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Few observations in replication slots related code
Hi, On 2014-06-12 09:15:08 +0200, Andres Freund wrote: 6. elog(ERROR, cannot handle changeset extraction yet); Shouldn't it be better to use logical replication instead of changeset extraction? Will change. I don't see that message anywhere in current code? All of those were rephrased in b89e151054a05f0f6d356ca52e3b725dd0505e53 (where Robert changed the term to logical decoding) and then implemented in 5a991ef8692ed0d170b44958a81a6bd70e90585c. Pushed a fix for the rest of the ones I commented on earlier. Thanks! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
On Wed, Jun 11, 2014 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: Your wish just seems like a separate feature to me. Including replication commands in 'all' seems correct independent of the desire for a more granular control. No, I think I've got to vote with the other side on that. The reason we can have log_statement as a scalar progression none ddl mod all is that there's little visible use-case for logging DML but not DDL, nor for logging SELECTS but not INSERT/UPDATE/DELETE. However, logging replication commands seems like something people would reasonably want an orthogonal control for. There's no nice way to squeeze such a behavior into log_statement. I guess you could say that log_statement treats replication commands as if they were DDL, but is that really going to satisfy users? I think we should consider log_statement to control logging of SQL only, and invent a separate GUC (or, in the future, likely more than one GUC?) for logging of replication activity. Seems reasonable. OK. The attached patch adds log_replication_command parameter which causes replication commands to be logged. I added this to next CF. Regards, -- Fujii Masao From 9874e36a8f65667d1f4d3a9a8d1d87d2d20f5188 Mon Sep 17 00:00:00 2001 From: MasaoFujii masao.fu...@gmail.com Date: Thu, 12 Jun 2014 19:32:00 +0900 Subject: [PATCH] Add log_replication_command configuration parameter. This GUC causes replication commands like IDENTIFY_SYSTEM to be logged in the server log. --- doc/src/sgml/config.sgml | 16 src/backend/replication/walsender.c | 11 +-- src/backend/utils/misc/guc.c |9 + src/backend/utils/misc/postgresql.conf.sample |1 + src/include/replication/walsender.h |1 + 5 files changed, 36 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 697cf99..8532f08 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4534,6 +4534,22 @@ FROM pg_stat_activity; /listitem /varlistentry + varlistentry id=guc-log-replication-command xreflabel=log_replication_command + termvarnamelog_replication_command/varname (typeboolean/type) + indexterm + primaryvarnamelog_replication_command/ configuration parameter/primary + /indexterm + /term + listitem + para +Causes each replication command to be logged in the server log. +See xref linkend=protocol-replication for more information about +replication command. The default value is literaloff/. +Only superusers can change this setting. + /para + /listitem + /varlistentry + varlistentry id=guc-log-temp-files xreflabel=log_temp_files termvarnamelog_temp_files/varname (typeinteger/type) indexterm diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 088ee2c..009ad92 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -108,6 +108,7 @@ bool am_db_walsender = false; /* Connected to a database? */ int max_wal_senders = 0; /* the maximum number of concurrent walsenders */ int wal_sender_timeout = 60 * 1000; /* maximum time to send one * WAL data message */ +bool log_replication_command = false; /* * State for WalSndWakeupRequest @@ -1261,13 +1262,19 @@ exec_replication_command(const char *cmd_string) MemoryContext old_context; /* + * Log replication command if log_replication_command is enabled. + * Even when it's disabled, log the command with DEBUG1 level for + * backward compatibility. + */ + ereport(log_replication_command ? LOG : DEBUG1, + (errmsg(received replication command: %s, cmd_string))); + + /* * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next * command arrives. Clean up the old stuff if there's anything. */ SnapBuildClearExportedSnapshot(); - elog(DEBUG1, received replication command: %s, cmd_string); - CHECK_FOR_INTERRUPTS(); cmd_context = AllocSetContextCreate(CurrentMemoryContext, diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 1d094f0..427af79 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -931,6 +931,15 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, { + {log_replication_command, PGC_SUSET, LOGGING_WHAT, + gettext_noop(Logs each replication command.), + NULL + }, + log_replication_command, + false, + NULL, NULL, NULL + }, + { {debug_assertions, PGC_USERSET, DEVELOPER_OPTIONS, gettext_noop(Turns on various assertion checks.), gettext_noop(This is a debugging aid.), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index d109394..70d86cf 100644 ---
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Hi Tom, On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). Did you plan to backpatch that? My inclination would be no... I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I'm dubious: there are very few if any callers that treat the INSERT and DELETE cases exactly alike. My current position on this is that we should leave the code as is 9.4 and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok with that? The second best thing imo would be to discern and return HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the respective cases. Which way would you like to go? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Audit of logout
Hi, Some users enable log_disconnections in postgresql.conf to audit all logouts. But since log_disconnections is defined with PGC_BACKEND, it can be changed at connection start. This means that any client (even nonsuperuser) can freely disable log_disconnections not to log his or her logout even when the system admin enables it in postgresql.conf. Isn't this problematic for audit? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cancelling statement due to user request error occurs but the transaction has committed.
Hi, Well, the only other principled fix I can see is to add a new reponse along the lines of ERRORBUTITCOMMITTED, which does not seem attractive either, since all clients will have to be taught to understand it. +1 I think current specification hard to understand for many users. It is really good if PostgreSQL gave us a message such as a replication abort warning: ### WARNING: canceling wait for synchronous replication due to user request DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. ### Regards, Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: anzai-na...@mxu.nes.nec.co.jp --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On Thu, Jun 12, 2014 at 12:25 PM, Ian Barwick wrote: On 14/06/12 18:46, Jochem van Dieten wrote: I haven't checked the code, but I am hoping it will help with the problem where a RETURNING * is added to a statement that is not an insert or update by the JDBC driver. That has been reported on the JDBC list at least twice, and the proposed workaround is neither very elegant nor very robust: https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ Unfortunately that seems to be a JDBC-specific issue, which is outside of the scope of this particular patch (which proposes additional server-side syntax intended to make RETURNING * operations more efficient for certain use cases, but which is in itself not a JDBC change). But the obvious way to fix the JDBC issue is not to fix it by adding a 'mini parser' on the JDBC side, but to make SELECT ... RETURNING PRIMARY KEY a regular select that silently ignores the returning clause and doesn't throw an error on the server-side. That might still be outside the scope of this particular patch, but it would provide (additional) justification if it were supported. Jochem -- Jochem van Dieten http://jochem.vandieten.net/
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 2014-06-12 13:58:31 +0200, Jochem van Dieten wrote: But the obvious way to fix the JDBC issue is not to fix it by adding a 'mini parser' on the JDBC side, but to make SELECT ... RETURNING PRIMARY KEY a regular select that silently ignores the returning clause and doesn't throw an error on the server-side. Brr. Then it'd need to be added to all statements, not just SELECT. I've my doubts that's going to happen. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 14/06/12 20:58, Jochem van Dieten wrote: On Thu, Jun 12, 2014 at 12:25 PM, Ian Barwick wrote: On 14/06/12 18:46, Jochem van Dieten wrote: I haven't checked the code, but I am hoping it will help with the problem where a RETURNING * is added to a statement that is not an insert or update by the JDBC driver. That has been reported on the JDBC list at least twice, and the proposed workaround is neither very elegant nor very robust: https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ Unfortunately that seems to be a JDBC-specific issue, which is outside of the scope of this particular patch (which proposes additional server-side syntax intended to make RETURNING * operations more efficient for certain use cases, but which is in itself not a JDBC change). But the obvious way to fix the JDBC issue is not to fix it by adding a 'mini parser' on the JDBC side, but to make SELECT ... RETURNING PRIMARY KEY a regular select that silently ignores the returning clause and doesn't throw an error on the server-side. That might still be outside the scope of this particular patch, but it would provide (additional) justification if it were supported. That would be adding superfluous, unused and unusable syntax of no potential value (there is no SELECT ... RETURNING and it wouldn't make any sense if there was) as a workaround for a driver issue - not going to happen. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
Robert Haas robertmh...@gmail.com wrote: Even aside from security exposures, how does a non-superuser who runs pg_dump know whether they've got a complete backup or a filtered dump that's missing some rows? This seems to me to be a killer objection to the feature as proposed, and points out a huge difference between column level security and the proposed implementation of row level security. (In fact it is a difference between just about any GRANTed permission and row level security.) If you try to SELECT * FROM sometable and you don't have rights to all the columns, you get an error. A dump would always either work as expected or generate an error. test=# create user bob; CREATE ROLE test=# create user bill; CREATE ROLE test=# set role bob; SET test= create table person (person_id int not null primary key, name text not null, ssn text); CREATE TABLE test= grant select (person_id, name) on table person to bill; GRANT test= reset role; RESET test=# set role bill; SET test= select person_id, name from person; person_id | name ---+-- (0 rows) test= select * from person; ERROR: permission denied for relation person The proposed approach would leave the validity of any dump which was not run as a superuser in doubt. The last thing we need, in terms of improving security, is another thing you can't do without connecting as a superuser. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shared memory changes in 9.4?
Re: Andres Freund 2014-06-12 20140612094112.gz8...@alap3.anarazel.de * Make initdb determine the best shm type for this platform and write it into postgresql.conf as it does now. * If no dynamic_shared_memory_type is found in the config, default to none. * Modify the three identical error messages concerned about shm segments to include the shm type instead of always just saying FATAL: could not open shared memory segment * Add a HINT to the POSIX error message: HINT: This might indicate that /dev/shm is not mounted, or its permissions do not allow the database user to create files there Sounds like a sane plan to me. Here are two patches, one that implements the annotated error messages, and one that selects none as default. It might also make sense to add a Note that POSIX depends on /dev/shm, and also a Note that dynamic_shared_memory_type is not related to the shared_buffers shm segments, which I didn't include here. Christoph -- c...@df7cb.de | http://www.df7cb.de/ diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c new file mode 100644 index 0819641..780e3a5 *** a/src/backend/storage/ipc/dsm_impl.c --- b/src/backend/storage/ipc/dsm_impl.c *** dsm_impl_posix(dsm_op op, dsm_handle han *** 289,296 if (errno != EEXIST) ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not open shared memory segment \%s\: %m, ! name))); return false; } --- 289,299 if (errno != EEXIST) ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not open POSIX shared memory segment \%s\: %m, ! name), ! errhint(This error usually means that /dev/shm is not mounted, or its ! permissions do not allow the database user to create files ! there.))); return false; } *** dsm_impl_posix(dsm_op op, dsm_handle han *** 313,319 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not stat shared memory segment \%s\: %m, name))); return false; } --- 316,322 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not stat POSIX shared memory segment \%s\: %m, name))); return false; } *** dsm_impl_posix(dsm_op op, dsm_handle han *** 332,338 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not resize shared memory segment %s to %zu bytes: %m, name, request_size))); return false; } --- 335,341 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not resize POSIX shared memory segment %s to %zu bytes: %m, name, request_size))); return false; } *** dsm_impl_posix(dsm_op op, dsm_handle han *** 358,364 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not unmap shared memory segment \%s\: %m, name))); return false; } --- 361,367 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not unmap POSIX shared memory segment \%s\: %m, name))); return false; } *** dsm_impl_posix(dsm_op op, dsm_handle han *** 382,388 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not map shared memory segment \%s\: %m, name))); return false; } --- 385,391 ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not map POSIX shared memory segment \%s\: %m, name))); return false; } *** dsm_impl_sysv(dsm_op op, dsm_handle hand *** 512,518 errno = save_errno; ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not get shared memory segment: %m))); } return false; } --- 515,521 errno = save_errno; ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not get System V shared memory segment: %m))); } return false; } *** dsm_impl_sysv(dsm_op op, dsm_handle hand *** 530,536 { ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not unmap shared memory segment \%s\: %m, name))); return false; } --- 533,539 { ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not unmap System V shared memory segment \%s\: %m, name))); return false; } *** dsm_impl_sysv(dsm_op op, dsm_handle hand *** 540,546 { ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could not remove shared memory segment \%s\: %m, name))); return false; } --- 543,549 { ereport(elevel, (errcode_for_dynamic_shared_memory(), ! errmsg(could
Re: [HACKERS] replication commands and log_statements
On Wed, Jun 11, 2014 at 7:42 AM, Magnus Hagander mag...@hagander.net wrote: Replication commands like IDENTIFY_COMMAND are not logged even when log_statements is set to all. Some users who use log_statements to audit *all* statements might dislike this current situation. So I'm thinking to change log_statements or add something like log_replication so that we can log replication commands. Thought? +1. I think adding a separate parameter is the way to go. The other option would be to turn log_statements into a parameter that you specify multiple ones I kind of like this idea, but... - so instead of all today it would be ddl,dml,all or something like that, and then you'd also add replication as an option. But that would cause all sorts of backwards compatibility annoyances.. And do you really want to be able to say things like ddl,all meanin you'd get ddl and select but not dml? ...you lost me here. I mean, I think it could be quite useful to redefine the existing GUC as a list. We could continue to have ddl, dml, and all as tokens that would be in the list, but you wouldn't write ddl,dml,all because all would include everything that those other ones would log. But then you could have combinations like dml,replication and so on. And you could do much more fine-grained things, like allow log_statement=create,alter,drop to log all such statements but not, for example, cluster. I think if we go the route of adding a separate GUC for this, we're going to get tired of adding GUCs way before we've come close to meeting the actual requirements in this area. A comma-separated list of tokens seems to offer a lot more flexibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf read from wrong directory
On Tue, May 27, 2014 at 2:05 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Sun, May 11, 2014 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's clearly *necessary* to forbid setting data_directory in postgresql.auto.conf. The file is defined to be found in the data directory, so any such setting is circular logic by definition; no good can come of not rejecting it. We already have a GUC flag bit about disallowing certain variables in the config file (though I don't remember if it's enforced or just advisory). It seems to me that we'd better invent one for disallowing in ALTER SYSTEM, as well. I introduced a new flag bit (GUC_DISALLOW_IN_AUTO_FILE) to disallow parameters by Alter System and disallowed data_directory to be set by Alter System. We should document what types of parameters are not allowed to be set by ALTER SYSTEM SET? data_directory was displayed when I typed TAB just after ALTER SYSTEM SET. Probably tab-completion for ALTER SYSTEM SET needs to be changed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Few observations in replication slots related code
On Thu, Jun 12, 2014 at 5:02 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-12 09:15:08 +0200, Andres Freund wrote: 6. elog(ERROR, cannot handle changeset extraction yet); Shouldn't it be better to use logical replication instead of changeset extraction? Will change. I don't see that message anywhere in current code? Right, actually I was reading code from Git History and also referring latest code, so it seems I have seen that in original commit and missed to verify it in latest code. While checking latest code, I got usage of *changeset extraction* in below comment: /* .. * * This is useful to initialize the cutoff xid after which a new *changeset* * *extraction* replication slot can start decoding changes. * .. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Noah Misch n...@leadboat.com writes: lo_new() or lo_make()? An earlier draft of the patch that added lo_create(oid, bytea) had a similar function named make_lo(). It appears that lo_make() has a small plurality, but before we lock that name in, there was one other idea that occurred to me: the underlying C function is currently named lo_create_bytea(), and that seems like not an awful choice for the SQL name either. Any other votes out there? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
On 2014-06-12 10:48:01 -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: lo_new() or lo_make()? An earlier draft of the patch that added lo_create(oid, bytea) had a similar function named make_lo(). It appears that lo_make() has a small plurality, but before we lock that name in, there was one other idea that occurred to me: the underlying C function is currently named lo_create_bytea(), and that seems like not an awful choice for the SQL name either. Any other votes out there? I prefer lo_create_bytea() or even lo_create_from_bytea(), lo_from_bytea(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Tom Lane wrote: Noah Misch n...@leadboat.com writes: lo_new() or lo_make()? An earlier draft of the patch that added lo_create(oid, bytea) had a similar function named make_lo(). It appears that lo_make() has a small plurality, but before we lock that name in, there was one other idea that occurred to me: the underlying C function is currently named lo_create_bytea(), and that seems like not an awful choice for the SQL name either. Any other votes out there? I was also going to suggest lo_create_bytea(). Another similar possibility would be lo_from_bytea() -- or, since we have overloading (and we can actually use it in this case without breaking libpq), we could just have lo_from(oid, bytea). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shared memory changes in 9.4?
On Thu, Jun 12, 2014 at 5:41 AM, Andres Freund and...@2ndquadrant.com wrote: With regard to Andres' proposal, I'm not that keen on setting dynamic_shared_memory_type='none' by default. Note that I'm not proposing to disable the whole thing. Just that a unset dynamic_shared_memory_type doesn't configure dsm. Initdb would still configure it after probing. OK, I misunderstood your position; thanks for clarifying. I think that would be OK with me. To some degree, I think that the test setup is broken by design: while we try to maintain backward-compatibility for postgresql.conf files, there's never been any guarantee that an old postgresql.conf file will work on a newer server. For example, a whole lot of pre-8.4 users probably had max_fsm_pages and max_fsm_relations configured, and with 8.4, those settings went away. Fixing that kind of thing is an essential part of the upgrade process. That having been said, in this particular case, we can probably ease the pain without much downside by doing as you suggest. The only thing I'm worried about is that people will discover much later that they don't have working dynamic shared memory, and be unhappy about that. Sometimes it's better to complain loudly at the beginning than to leave buried problems for later. But I'll defer to the majority on what to do in his instance. To me, the following should be done: * Make initdb determine the best shm type for this platform and write it into postgresql.conf as it does now. * If no dynamic_shared_memory_type is found in the config, default to none. * Modify the three identical error messages concerned about shm segments to include the shm type instead of always just saying FATAL: could not open shared memory segment * Add a HINT to the POSIX error message: HINT: This might indicate that /dev/shm is not mounted, or its permissions do not allow the database user to create files there Sounds like a sane plan to me. +1 to the rest of this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
On Thu, Jun 12, 2014 at 10:56 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Noah Misch n...@leadboat.com writes: lo_new() or lo_make()? An earlier draft of the patch that added lo_create(oid, bytea) had a similar function named make_lo(). It appears that lo_make() has a small plurality, but before we lock that name in, there was one other idea that occurred to me: the underlying C function is currently named lo_create_bytea(), and that seems like not an awful choice for the SQL name either. Any other votes out there? I was also going to suggest lo_create_bytea(). That sounds good to me, too. Presumably we should also fix libpq to not be so dumb. I mean, it doesn't help with the immediate problem, since as you say there could be non-upgraded copies of libpq out there for the indefinite future, but it still seems like something we oughta fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Any other votes out there? I was also going to suggest lo_create_bytea(). Another similar possibility would be lo_from_bytea() -- or, since we have overloading (and we can actually use it in this case without breaking libpq), we could just have lo_from(oid, bytea). Andres also mentioned lo_from_bytea(), and I kinda like it too. I don't like plain lo_from(), as it's not too apparent what it's supposed to get data from --- someone might think the second parameter was supposed to be a file name a la lo_import(), for example. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suppressing unused subquery output columns
On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: The attached draft patch fixes this by deleting unused output expressions from unflattened subqueries, so that we get: regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss; QUERY PLAN -- Subquery Scan on ss (cost=0.00..0.02 rows=1 width=4) - Limit (cost=0.00..0.01 rows=1 width=4) - Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4) Planning time: 0.193 ms (4 rows) I'm not entirely convinced that it's worth the extra planning cycles, though. Given the small number of complaints to date, it might not be worth doing this. Thoughts? I've encountered this issue before and think it would be great to fix it. I'm not sure precisely how many cycles it's worth paying, but definitely more than zero. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Lo_from_bytea sounds me better than lo_create_bytea
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Robert Haas robertmh...@gmail.com writes: Presumably we should also fix libpq to not be so dumb. I mean, it doesn't help with the immediate problem, since as you say there could be non-upgraded copies of libpq out there for the indefinite future, but it still seems like something we oughta fix. It's been in the back of my mind for awhile that doing a dynamic query at all here is pretty pointless. It's not like the OIDs of those functions ever have or ever will move. It would probably be more robust if we just let libpq be a consumer of fmgroids.h and refer directly to the constants F_LO_CREATE etc. I think there was some previous discussion about this, possibly tied to also having a better-defined way to let clients depend on hard-wired type OIDs, but I'm too lazy to search the archives right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shared memory changes in 9.4?
Hi, On 2014-06-12 11:08:35 -0400, Robert Haas wrote: On Thu, Jun 12, 2014 at 5:41 AM, Andres Freund and...@2ndquadrant.com wrote: With regard to Andres' proposal, I'm not that keen on setting dynamic_shared_memory_type='none' by default. Note that I'm not proposing to disable the whole thing. Just that a unset dynamic_shared_memory_type doesn't configure dsm. Initdb would still configure it after probing. OK, I misunderstood your position; thanks for clarifying. I think that would be OK with me. To some degree, I think that the test setup is broken by design: while we try to maintain backward-compatibility for postgresql.conf files, there's never been any guarantee that an old postgresql.conf file will work on a newer server. For example, a whole lot of pre-8.4 users probably had max_fsm_pages and max_fsm_relations configured, and with 8.4, those settings went away. Fixing that kind of thing is an essential part of the upgrade process. While I think I see where you're coming from I don't think these cases are comparable. In this case commenting out the GUC can prevent the server from starting. That's pretty odd imo. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suppressing unused subquery output columns
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: The attached draft patch fixes this by deleting unused output expressions from unflattened subqueries, so that we get: ... I'm not entirely convinced that it's worth the extra planning cycles, though. Given the small number of complaints to date, it might not be worth doing this. Thoughts? I've encountered this issue before and think it would be great to fix it. I'm not sure precisely how many cycles it's worth paying, but definitely more than zero. We have a couple votes for this patch and no one has spoken against it, so I'll go ahead and push it into HEAD. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Jun 5, 2014 at 8:37 PM, Peter Geoghegan p...@heroku.com wrote: * A configure AC_TRY_RUN tests the suitability of the system strxfrm() implementation for the purposes of this optimization. There can be no header bytes (people at pgCon reported redundant bytes on the Mac OSX implementation at pgCon, to my great surprise), ... I'm attaching a small test program demonstrating the Mac behavior. Here's some sample output: [rhaas ~]$ ./strxfrm en_US a aa ab abc abcd abcde peter Geoghegan _ % - a - 001S001S aa - 001S001S001S001S ab - 001S001T001S001T abc - 001S001T001U001S001T001U abcd - 001S001T001U001V001S001T001U001V abcde - 001S001T001U001V001W001S001T001U001V001W peter - 001b001W001f001W001d001b001W001f001W001d Geoghegan - 0019001W001a001Y001Z001W001Y001S001`0019001W001a001Y001Z001W001Y001S001` _ - 001Q001Q % - 000W000W It appears that any string starting with the letter a will create output that begins with 001S00 and the seventh character always appears to be 0 or 1: [rhaas ~]$ ./strxfrm en_US ab ac ad ae af a% a0 a ab - 001S001T001S001T ac - 001S001U001S001U ad - 001S001V001S001V ae - 001S001W001S001W af - 001S001X001S001X a% - 001S000W001S000W a0 - 001S000b001S000b a - 001S000R001S000R Also, the total number of bytes produced is apparently 8N+4, where N is the length of the input string. On a Linux system I tested, the output included non-printable characters, so I adapted the test program to print the results in hex. Attaching that version, too. Here, the length was 3N+2, except for 0-length strings which produce a 0-length result: [rhaas@hydra ~]$ ./strxfrm-binary en_US a aa ab abc abcd abcde peter Geoghegan _ % - (0 bytes) a - 0c01020102 (5 bytes) aa - 0c0c010202010202 (8 bytes) ab - 0c0d010202010202 (8 bytes) abc - 0c0d0e0102020201020202 (11 bytes) abcd - 0c0d0e0f01020202020102020202 (14 bytes) abcde - 0c0d0e0f10010202020202010202020202 (17 bytes) peter - 1b101f101d010202020202010202020202 (17 bytes) Geoghegan - 12101a121310120c190102020202020202020201040202020202020202 (29 bytes) _ - 0101010112 (5 bytes) % - 0101010139 (5 bytes) [rhaas@hydra ~]$ ./strxfrm-binary en_US ab ac ad ae af a% a0 a ab - 0c0d010202010202 (8 bytes) ac - 0c0e010202010202 (8 bytes) ad - 0c0f010202010202 (8 bytes) ae - 0c10010202010202 (8 bytes) af - 0c11010202010202 (8 bytes) a% - 0c01020102010239 (8 bytes) a0 - 0c02010202010202 (8 bytes) a - 0c01020102010211 (8 bytes) Even though each input bytes is generating 3 output bytes, it's not generating a group of output bytes for each input byte as appears to be happening on MacOS X. If it were doing that, then truncating the blob to 8 bytes would only compare the first 2-3 bytes of the input string. In fact we do better. If we change even the 8th letter in the string to some other letter, the 8th output byte changes: [rhaas@hydra ~]$ ./strxfrm-binary en_US aaab - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes) aaab - 0c0c0c0c0c0c0c0d010202020202020202010202020202020202 (26 bytes) If we change the capitalization of the eighth byte, then the change happens further out: [rhaas@hydra ~]$ ./strxfrm-binary en_US aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes) aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020204 (26 bytes) Still, it's fair to say that on this Linux system, the first 8 bytes capture a significant portion of the entropy of the first 8 bytes of the string, whereas on MacOS X you only get entropy from the first 2 bytes of the string. It would be interesting to see results from other platforms people might care about also. The cost of adding all of these ameliorating measures appears to be very low. We're so bottlenecked on memory bandwidth that the fixed-size overhead of maintaining poor man cardinality, and the extra cycles from hashing n keys for the benefit of HyperLogLog just don't seem to matter next to the big savings for n log n comparisons. The best case performance is seemingly just as good as before (about a 200% improvement in transaction throughput for one client, but closer to a 250% improvement with many clients and/or expensive collations), while the worst case is much much better. I haven't looked at the patch yet, but this sounds promising. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company #include stdio.h #include stdlib.h #include errno.h #include locale.h #include string.h int main(int argc, char **argv) { char *buffer = NULL; size_t buflen = 0; if (argc 3) { fprintf(stderr, Usage: strxfrm {collation} {string}...\n); exit(1); } if (setlocale(LC_COLLATE, argv[1]) == NULL) { fprintf(stderr, setlocale: %s\n, strerror(errno)); exit(1); } argv += 2; while (*argv) { size_t r; r = strxfrm(buffer, *argv, buflen); if (r buflen) { if (buffer != NULL) free(buffer);
Re: [HACKERS] Proposing pg_hibernate
On Thu, Jun 12, 2014 at 12:17 AM, Gurjeet Singh gurj...@singh.im wrote: On Wed, Jun 11, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh gurj...@singh.im wrote: And it's probably accepted by now that such a bahviour is not catastrophic, merely inconvenient. I think the whole argument for having pg_hibernator is that getting the block cache properly initialized is important. If it's not important, then we don't need pg_hibernator in the first place. But if it is important, then I think not loading unrelated blocks into shared_buffers is also important. I was constructing a contrived scenario, something that would rarely happen in reality. I feel that the benefits of this feature greatly outweigh the minor performance loss caused in such an unlikely scenario. So, are you proposing this for inclusion in PostgreSQL core? If not, I don't think there's much to discuss here - people can use it or not as they see fit, and we'll see what happens and perhaps design improvements will result, or not. If so, that's different: you'll need to demonstrate the benefits via convincing proof points, and you'll also need to show that the disadvantages are in fact minor and that the scenario is in fact unlikely. So far there are zero performance numbers on this thread, a situation that doesn't meet community standards for a performance patch. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
refactoring allpaths.c (was Re: [HACKERS] Suppressing unused subquery output columns)
I wrote: We have a couple votes for this patch and no one has spoken against it, so I'll go ahead and push it into HEAD. BTW, I forgot to mention that while working on this patch I was thinking it's past time to separate out the subquery support in allpaths.c into its own file. With this patch, allpaths.c is 2363 lines, about 690 of which are set_subquery_pathlist and subsidiary functions. While that might not be quite tail-wagging-dog level, I think it's enough to justify splitting that code out into a new file, say optimizer/path/subquerypath.c. There are also about 630 lines devoted to appendrel path generation, which might be thought enough to deserve a separate file of its own. I'm a bit less excited about that though, mainly because the appendrel logic has some not-quite-arms-length interactions with set_rel_size(); but there's certainly some case to be made for doing it. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Jun 12, 2014 at 1:25 PM, Robert Haas robertmh...@gmail.com wrote: It appears that any string starting with the letter a will create output that begins with 001S00 and the seventh character always appears to be 0 or 1: [rhaas ~]$ ./strxfrm en_US ab ac ad ae af a% a0 a ab - 001S001T001S001T ac - 001S001U001S001U ad - 001S001V001S001V ae - 001S001W001S001W af - 001S001X001S001X a% - 001S000W001S000W a0 - 001S000b001S000b a - 001S000R001S000R ... [rhaas@hydra ~]$ ./strxfrm-binary en_US aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes) aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020204 (26 bytes) Still, it's fair to say that on this Linux system, the first 8 bytes capture a significant portion of the entropy of the first 8 bytes of the string, whereas on MacOS X you only get entropy from the first 2 bytes of the string. It would be interesting to see results from other platforms people might care about also. If you knew mac's output character set with some certainty, you could compress it rather efficiently by applying a tabulated decode back into non-printable bytes. Say, like base64 decoding (the set appears to be a subset of base64, but it's hard to be sure). Ie, x = strxfrm(s) xz = hex(tab[x[0]] + 64*tab[x[1]] + 64^2*tab[x[2]] ... etc) This can be made rather efficient. But the first step is defining with some certainty the output character set. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] add line number as prompt option to psql
Hi all, The attached IWP patch is one prompt option for psql, which shows current line number. If the user made syntax error with too long SQL then psql outputs message as following. ERROR: syntax error at or near a LINE 250: hoge ^ psql teaches me where syntax error is occurred, but it is not kind when SQL is too long. We can use write SQL with ¥e(editor) command(e.g., emacs) , and we can know line number. but it would make terminal log dirty . It will make analyzing of log difficult after error is occurred. (I think that we usually use copy paste) After attached this patch, we will be able to use %l option as prompting option. e.g., $ cat ~/.psqlrc \set PROMPT2 '%/[%l]%R%# ' \set PROMPT1 '%/[%l]%R%# ' $ psql -d postgres postgres[1]=# select postgres[2]-# * postgres[3]-# from postgres[4]-# hoge; The past discussion is following. http://www.postgresql.org/message-id/cafj8prc1rupk6+cha1jpxph3us_zipabdovmceex4wpbp2k...@mail.gmail.com Please give me feedback. Regards, --- Sawada Masahiko psql-line-number_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
I wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I was also going to suggest lo_create_bytea(). Another similar possibility would be lo_from_bytea() -- or, since we have overloading (and we can actually use it in this case without breaking libpq), we could just have lo_from(oid, bytea). Andres also mentioned lo_from_bytea(), and I kinda like it too. I don't like plain lo_from(), as it's not too apparent what it's supposed to get data from --- someone might think the second parameter was supposed to be a file name a la lo_import(), for example. Since the discussion seems to have trailed off, I'm going to run with lo_from_bytea(). The plan is: 1. Rename the function. 2. Add an opr_sanity regression test memorializing what we should get from lo_initialize()'s query. 3. Make sure that the regression tests leave behind a few large objects, so that testing of pg_dump/pg_restore using the regression database will exercise the large-object code paths. It'd be a good thing if the TAP tests for client programs included testing of pg_dump/pg_restore, but that's a bit beyond my competence with that tool ... anyone care to step up? Redoing or getting rid of lo_initialize()'s query would be a good thing too; but that does not seem like material for back-patching into 9.4, while I think all the above are. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Question about partial functional indexes and the query planner
On Wed, Jun 11, 2014 at 7:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 10, 2014 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Given the lack of previous complaints, I'm not sure this amounts to a back-patchable bug, but it does seem like something worth fixing going forward. Agreed, although I'd be willing to see us slip it into 9.4. It's doubtful that anyone will get upset if their query plans change between beta1 and beta2, but the same cannot be said for released branches. After further thought about this I realized that there's another category of proof possibilities that is pretty simple to add while we're at it. Namely, once we've found that both subexpressions of the two operator clauses are equal(), we can use btree opclass relationships to prove that, say, x y implies x = y or x y refutes x y, independently of just what x and y are. (Well, they have to be immutable expressions, but we'd not get this far if they're not.) We already had pretty nearly all of the machinery for that, but again it was only used for proving cases involving comparisons to constants. A little bit of refactoring later, I offer the attached draft patch. I'm thinking this is probably more than we want to slip into 9.4 at this point, but I'll commit it to HEAD soon if there are not objections. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers I applied Tom's patch to the latest HEAD (e04a9ccd2ccd6e31cc4af6b08257a0a186d0fce8) and showed it to Brian. Looks to solve the problem he originally reported $ patch -p1 -i ../better-predicate-proofs-1.patch (Stripping trailing CRs from patch.) patching file src/backend/optimizer/util/predtest.c $ /opt/pgsql_patch_review/bin/psql postgres Timing is on. Null display (null) is «NULL». Expanded display (expanded) is used automatically. psql (9.5devel) Type help for help. postgres=# CREATE OR REPLACE FUNCTION public.return_if_even(v_id integer) returns integer postgres-# LANGUAGE sql AS postgres-# $$ postgres$# SELECT case when v_id % 2 = 1 then 0 else v_id end; postgres$# $$; CREATE FUNCTION Time: 44.669 ms postgres=# create table public.partial_functional_index_test as postgres-# select id from generate_series(1,100) AS s(id); SELECT 100 Time: 1037.993 ms postgres=# create index partial_functional_idx ON public.partial_functional_index_test postgres-# USING btree ( public.return_if_even(id) ) postgres-# WHERE public.return_if_even(id) = id; LOG: sending cancel to blocking autovacuum PID 12521 DETAIL: Process 12424 waits for ShareLock on relation 16385 of database 12217. STATEMENT: create index partial_functional_idx ON public.partial_functional_index_test USING btree ( public.return_if_even(id) ) WHERE public.return_if_even(id) = id; ERROR: canceling autovacuum task CONTEXT: automatic analyze of table postgres.public.partial_functional_index_test CREATE INDEX Time: 1658.245 ms postgres=# explain analyze select count(1) from public.partial_functional_index_test where public.return_if_even(id) = id; QUERY PLAN - Aggregate (cost=4818.05..4818.06 rows=1 width=0) (actual time=2503.851..2503.854 rows=1 loops=1) - Bitmap Heap Scan on partial_functional_index_test (cost=82.67..4805.55 rows=5000 width=0) (actual time=43.724..1309.309 rows=50 loops=1) Recheck Cond: (CASE WHEN ((id % 2) = 1) THEN 0 ELSE id END = id) Heap Blocks: exact=4425 - Bitmap Index Scan on partial_functional_idx (cost=0.00..81.42 rows=5000 width=0) (actual time=42.961..42.961 rows=50 loops=1) Planning time: 4.245 ms Execution time: 2505.281 ms (7 rows) Time: 2515.344 ms postgres=# explain analyze select count(1) from public.partial_functional_index_test where id = public.return_if_even(id); QUERY PLAN - Aggregate (cost=4818.05..4818.06 rows=1 width=0) (actual time=2483.862..2483.866 rows=1 loops=1) - Bitmap Heap Scan on partial_functional_index_test (cost=82.67..4805.55 rows=5000 width=0) (actual time=40.704..1282.955 rows=50 loops=1) Recheck Cond: (CASE WHEN ((id % 2) = 1) THEN 0 ELSE id END = id) Heap Blocks: exact=4425 - Bitmap Index Scan on partial_functional_idx (cost=0.00..81.42 rows=5000 width=0) (actual time=39.657..39.657 rows=50 loops=1) Planning time: 0.127 ms Execution time: 2483.979 ms (7 rows) -- Keith Fiske Database
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote: Since the discussion seems to have trailed off, I'm going to run with lo_from_bytea(). The plan is: 1. Rename the function. 2. Add an opr_sanity regression test memorializing what we should get from lo_initialize()'s query. 3. Make sure that the regression tests leave behind a few large objects, so that testing of pg_dump/pg_restore using the regression database will exercise the large-object code paths. Sounds good. It'd be a good thing if the TAP tests for client programs included testing of pg_dump/pg_restore, but that's a bit beyond my competence with that tool ... anyone care to step up? The pg_upgrade test suite covers this well. Redoing or getting rid of lo_initialize()'s query would be a good thing too; but that does not seem like material for back-patching into 9.4, while I think all the above are. I agree all the above make sense for 9.4. I also wouldn't object to a hardening of lo_initialize() sneaking in at this stage. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Tue, Jun 10, 2014 at 10:36:07AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Mon, Jun 09, 2014 at 09:04:02PM -0400, Peter Eisentraut wrote: I'd consider just getting rid of the (c-file-style . bsd) setting in .dir-locals.el and put the actual interesting settings from the style in there. Another alternative is to change emacs.samples to use hack-local-variables-hook instead, as described here: http://www.emacswiki.org/emacs/LocalVariables Those are reasonable alternatives. The ultimate effect looks similar for all three options, to the extent that I don't see a basis for forming a strong preference. Do you have a recommendation? The more Ludd^H^H^Hcautious among us run with (setq enable-local-variables nil) and would rather not see the configuration recommendations overcomplicated due to an attempt to play nice with directory-local settings we're ignoring anyway. So I'd vote for Peter's first option, ie, kluge up .dir-locals.el not the configuration code. Seeing the two votes cast and only cosmetic differences between the options, I'll just stick with my original -v1. Thanks. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make check For Extensions
Andres said during the unconference last month that there was a way to get `make check` to work with PGXS. The idea is that it would initialize a temporary cluster, start it on an open port, install an extension, and run the extension's test suite. I think the pg_regress --temp-install, maybe? I poked through the PGXS makefiles, and although it looks like there *might* be something like this for in-core contrib extensions, but not for externally-distributed extensions. Is there something I could add to my extension Makefiles so that `make check` or `make test` will do a pre-install test on a temporary cluster? My 0.02€: It is expected to work, more or less, see the end of http://www.postgresql.org/docs/9.3/static/extend-pgxs.html It invokes psql which is expected to work directly. Note that there is no temporary installation, it is tested against the installed and running postgres. Maybe having the ability to create a temporary installation, as you suggest, would be a nice extension. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
Noah Misch n...@leadboat.com writes: On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote: It'd be a good thing if the TAP tests for client programs included testing of pg_dump/pg_restore, but that's a bit beyond my competence with that tool ... anyone care to step up? The pg_upgrade test suite covers this well. Um, not really: what pg_upgrade exercises is pg_dump -s which entirely fails to cover the data-transfer code paths. It would not have found this problem. BTW, after further testing I realized that it was quite accidental that I found it either. pg_restore only uses libpq's lo_create() function when restoring an old_blob_style archive, ie one generated by 8.4 or earlier. That's what I happened to try to do last night, but it's pure luck that I did. Poking around with making the largeobject regression test leave some stuff behind, I found out that: 1. That regression test includes the text of a Robert Frost poem that AFAICT is still under copyright. I think we'd better replace it with something by someone a bit more safely dead. 2. I tried to add a COMMENT ON LARGE OBJECT to one of the not-removed blobs. While pg_upgrade didn't fail on transferring the blob, it *did* fail to transfer the comment on it, which seems like a bug. Bruce, have you got any idea how to fix that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make check For Extensions
On Jun 12, 2014, at 11:28 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: My 0.02€: It is expected to work, more or less, see the end of http://www.postgresql.org/docs/9.3/static/extend-pgxs.html That says: “The scripts listed in the REGRESS variable are used for regression testing of your module, which can be invoked by make installcheck after doing make install. For this to work you must have a running PostgreSQL server.” That does not mean that it starts a new cluster on a port. It means it will test it against an existing cluster after you have installed into that cluster. It invokes psql which is expected to work directly. Note that there is no temporary installation, it is tested against the installed and running postgres. Maybe having the ability to create a temporary installation, as you suggest, would be a nice extension. Yes, that’s what I would like, so I could test *before* installing. Best, David signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we're going to do this, I would say that we should also take the opportunity to get out from under the question of which kernel API we're dealing with. So perhaps a design like this: 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's the name of a file to write something into after forking. The typical value would be /proc/self/oom_score_adj or /proc/self/oom_adj. If not set, nothing happens. 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's the string we write, otherwise we write 0. Please find attached the patch. It includes the doc changes as well. Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EDB www.EnterpriseDB.com diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 9fadef5..4492a1d 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1284,8 +1284,15 @@ echo -1000 /proc/self/oom_score_adj in the postmaster's startup script just before invoking the postmaster. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you -do this, you may also wish to build productnamePostgreSQL/ -with literal-DLINUX_OOM_SCORE_ADJ=0/ added to varnameCPPFLAGS/. +do this, you may also wish to export these environment variables +programlisting +PG_OOM_ADJUST_FILE=oom_score_adj +PG_OOM_ADJUST_VALUE=0 + +export PG_OOM_ADJUST_FILE +export PG_OOM_ADJUST_VALUE +/programlisting +in the startup script, before invoking the postmaster. That will cause postmaster child processes to run with the normal varnameoom_score_adj/ value of zero, so that the OOM killer can still target them at need. @@ -1296,8 +1303,7 @@ echo -1000 /proc/self/oom_score_adj but may have a previous version of the same functionality called filename/proc/self/oom_adj/. This works the same except the disable value is literal-17/ not literal-1000/. The corresponding -build flag for productnamePostgreSQL/ is -literal-DLINUX_OOM_ADJ=0/. +value for envarPG_OOM_ADJUST_FILE/ should be varnameoom_adj/. /para note diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c index f6df2de..21559af 100644 --- a/src/backend/postmaster/fork_process.c +++ b/src/backend/postmaster/fork_process.c @@ -22,6 +22,13 @@ #endif #ifndef WIN32 + +#ifdef __linux__ +static bool oom_env_checked = false; +static char oom_adj_file[MAXPGPATH]; +static int oom_adj_value = 0; +#endif /* __linux__ */ + /* * Wrapper for fork(). Return values are the same as those for fork(): * -1 if the fork failed, 0 in the child process, and the PID of the @@ -78,13 +85,38 @@ fork_process(void) * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you * want child processes to adopt here. */ -#ifdef LINUX_OOM_SCORE_ADJ +#ifdef __linux__ + if (!oom_env_checked) + { + char *env; + + oom_env_checked = true; + + env = getenv(PG_OOM_ADJUST_FILE); + + /* Don't allow a path separator in file name */ + if (env !strchr(env, '/') !strchr(env, '\\')) + { +snprintf(oom_adj_file, MAXPGPATH, /proc/self/%s, env); + +env = getenv(PG_OOM_ADJUST_VALUE); +if (env) +{ + oom_adj_value = atoi(env); +} +else + oom_adj_value = 0; + } + else +oom_adj_file[0] = '\0'; + } + { /* * Use open() not stdio, to ensure we control the open flags. Some * Linux security environments reject anything but O_WRONLY. */ - int fd = open(/proc/self/oom_score_adj, O_WRONLY, 0); + int fd = open(oom_adj_file, O_WRONLY, 0); /* We ignore all errors */ if (fd = 0) @@ -92,41 +124,14 @@ fork_process(void) char buf[16]; int rc; -snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_SCORE_ADJ); +snprintf(buf, sizeof(buf), %d\n, oom_adj_value); rc = write(fd, buf, strlen(buf)); (void) rc; close(fd); } } -#endif /* LINUX_OOM_SCORE_ADJ */ - /* - * Older Linux kernels have oom_adj not oom_score_adj. This works - * similarly except with a different scale of adjustment values. If - * it's necessary to build Postgres to work with either API, you can - * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ. - */ -#ifdef LINUX_OOM_ADJ - { - /* - * Use open() not stdio, to ensure we control the open flags. Some - * Linux security environments reject anything but O_WRONLY. - */ - int fd = open(/proc/self/oom_adj, O_WRONLY, 0); - - /* We ignore all errors */ - if (fd = 0) - { -char buf[16]; -int rc; - -snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_ADJ); -rc = write(fd, buf, strlen(buf)); -(void) rc; -close(fd); - } - } -#endif /* LINUX_OOM_ADJ */ +#endif /* __linux__ */ /* * Make sure processes do not share OpenSSL randomness state. -- Sent via pgsql-hackers mailing
[HACKERS] Crash on backend exit w/ OpenLDAP [2.4.24, 2.4.31]
With LDAP support enabled, we link the backend with libldap, and we link libpq with libldap_r. Modules like dblink and postgres_fdw link to libpq, so loading them results in a backend having both libldap and libdap_r loaded. Those libraries export the same symbols, and the load order rule gives priority to the libldap symbols. So far, so good. However, both libraries declare a GCC destructor, ldap_int_destroy_global_options(), to free memory reachable from a global variable, ldap_int_global_options. In OpenLDAP versions 2.4.24 through 2.4.31, that variable's structure type has an incompatible layout in libldap vs. libldap_r. When the libldap_r build of the destructor uses pointers from the ldap_int_global_options of libldap, SIGSEGV ensues. This does not arise if nothing in the process ever caused OpenLDAP to initialize itself, because the relevant bytes then happen to be all-zero in either structure layout. OpenLDAP 2.4.32 fixed this by making the libldap structure a strict prefix of the libldap_r structure. The OpenLDAP change log merely says Fixed libldap sasl handling (ITS#7118, ITS#7133); here are the relevant commits: http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=270ef33acf18dc13bfd07f8a8e66b446f80e7d27 http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=7ff18967d7d2e49baa9d80f1b9408b276f3982e0 You can cause the at-exit crash by building PostgreSQL against OpenLDAP 2.4.31, connecting with LDAP authentication, and issuing LOAD 'dblink'. Alternately, connect with any authentication method and create a dblink connection that uses an LDAP-based pg_service.conf entry. I'm attaching a test suite patch to illustrate that second vector. The popularity of the affected OpenLDAP versions is not clear to me. RHEL 5, 6 and 7 all ship OpenLDAP either old enough or new enough to miss the problem. Debian wheezy ships 2.4.31, an affected version. I have not looked beyond that. I pondered what we could do short of treating this as a pure OpenLDAP bug for packagers to fix in their OpenLDAP builds, but I did not come up with anything singularly attractive. Some possibilities: 1. Link the backend with libldap_r, so we never face the mismatch. On some platforms, this means also linking in threading libraries. 2. Link a second copy of libpq against plain libldap and use that in server modules like dblink. 3. Wipe the memory of ldap_int_global_options so the destructor will have nothing to do. A challenge here is that we don't know the structure size; it's not part of any installed header, and it varies in response to OpenLDAP configuration options. Still, this is achievable in principle. This would be easy if the destructor function weren't static, alas. 4. Detect older OpenLDAP versions at runtime, just before we would otherwise initialize OpenLDAP, and raise an error. Possibly make the same check at compile time, for packager convenience. 5. Use only _exit(), not exit(). Hopefully I'm missing a great alternative, because each of those has something substantial to dislike about it. Thoughts welcome, especially from packagers dealing with platforms that use affected OpenLDAP versions. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index f237c43..fd61985 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -103,6 +103,18 @@ SELECT * FROM dblink('SELECT * FROM foo') AS t(a int, b text, c text[]) WHERE t.a 7; ERROR: connection not available +-- The first-level connection's backend will crash on exit given OpenLDAP +-- [2.4.24, 2.4.31]. Give the postmaster time to act on the crash. +SELECT pg_sleep(1) +FROM dblink('dbname=contrib_regression', $$SELECT * FROM + dblink('service=test_ldap dbname=contrib_regression', + 'select 1') t(c int)$$) +AS t(c int); + pg_sleep +-- + +(1 row) + -- create a persistent connection SELECT dblink_connect('dbname=contrib_regression'); dblink_connect diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql index 2a10760..fa98285 100644 --- a/contrib/dblink/sql/dblink.sql +++ b/contrib/dblink/sql/dblink.sql @@ -65,6 +65,14 @@ SELECT * FROM dblink('SELECT * FROM foo') AS t(a int, b text, c text[]) WHERE t.a 7; +-- The first-level connection's backend will crash on exit given OpenLDAP +-- [2.4.24, 2.4.31]. Give the postmaster time to act on the crash. +SELECT pg_sleep(1) +FROM dblink('dbname=contrib_regression', $$SELECT * FROM + dblink('service=test_ldap dbname=contrib_regression', + 'select 1') t(c int)$$) +AS t(c int); + -- create a persistent connection SELECT dblink_connect('dbname=contrib_regression'); diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 4b31c0a..19bfe75 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Peter Geoghegan p...@heroku.com writes: ... In any case it's pretty clear that a goal of the glibc implementation is to concentrate as much entropy as possible into the first part of the string, and that's the important point. This makes perfect sense, and is why I was so incredulous about the Mac behavior. I think this may be another facet of something we were already aware of, which is that the UTF8 locales on OS X pretty much suck. It's fairly clear that Apple has put no effort into achieving more than minimal standards compliance for those. Sorting doesn't work as expected in those locales, for example. Still, that's reality, and any proposal to rely on strxfrm is going to have to deal with it :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
On Thu, Jun 12, 2014 at 02:53:23PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote: It'd be a good thing if the TAP tests for client programs included testing of pg_dump/pg_restore, but that's a bit beyond my competence with that tool ... anyone care to step up? The pg_upgrade test suite covers this well. Um, not really: what pg_upgrade exercises is pg_dump -s which entirely fails to cover the data-transfer code paths. It would not have found this problem. I see. TAP suite coverage for a data-included dump/restore would be worth its weight, agreed. BTW, after further testing I realized that it was quite accidental that I found it either. pg_restore only uses libpq's lo_create() function when restoring an old_blob_style archive, ie one generated by 8.4 or earlier. That's what I happened to try to do last night, but it's pure luck that I did. That could have easily remained undiscovered until release day. Not good. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Thanks for looking into this. On Thu, Jun 12, 2014 at 9:25 AM, Robert Haas robertmh...@gmail.com wrote: Still, it's fair to say that on this Linux system, the first 8 bytes capture a significant portion of the entropy of the first 8 bytes of the string, whereas on MacOS X you only get entropy from the first 2 bytes of the string. It would be interesting to see results from other platforms people might care about also. Right. It was a little bit incautious of me to say that we had the full benefit of 8 bytes of storage with en_US.UTF-8, since that is only true of lower case characters (I think that FreeBSD can play tricks with this. Sometimes, it will give you the benefit of 8 bytes of entropy for an 8 byte string, with only non-differentiating trailing bytes, so that the first 8 bytes of Aaaa are distinct from the first eight bytes of , while any trailing bytes are non-distinct for both). In any case it's pretty clear that a goal of the glibc implementation is to concentrate as much entropy as possible into the first part of the string, and that's the important point. This makes perfect sense, and is why I was so incredulous about the Mac behavior. After all, the Open Group's strcoll() documentation says: The strxfrm() and strcmp() functions should be used for sorting large lists. Sorting text is hardly an infrequent requirement -- it's almost the entire reason for having strxfrm() in the standard. You're always going to want to have each strcmp() find differences as early as possible. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On 6/11/14, 10:26 AM, Robert Haas wrote: Now, as soon as we introduce the concept that selecting from a table might not really mean read from the table but read from the table after applying this owner-specified qual, we're opening up a whole new set of attack surfaces. Every pg_dump is an opportunity to hack somebody else's account, or at least audit their activity. I'm in full agreement we should clearly communicate the issues around pg_dump in particular, because they can't necessarily be eliminated altogether without some major work that's going to take a while to finish. And if the work-around is some sort of GUC for killing RLS altogether, that's ugly but not unacceptable to me as a short-term fix. One of the difficult design requests in my inbox right now asks how pg_dump might be changed both to reduce its overlap with superuser permissions and to allow auditing of its activity. Those requests aren't going away; their incoming frequency is actually rising quite fast right now. They're both things people expect from serious SQL oriented commercial database products, and I'd like to see PostgreSQL continue to displace those as we reach feature parity in those areas. Any way you implement finer grained user permissions and auditing features will be considered a new attack vector when you use those features. The way the proposed RLS feature inserts an arbitrary function for reads has a similar new attack vector when you use that feature. I'm kind of surprised to see this turn into a hot button all of the sudden though, because my thought on all that so far has been a giant so what? This is what PostgreSQL does. You wanna write your own C code and then link the thing right into the server, so that bugs can expose data and crash the whole server? Not only can you shoot yourself in the foot that way, we supply a sample gun and bullets in contrib. How about writing arbitrary code in any one of a dozen server-side languages of wildly varying quality, then hooking that code so it runs as a trigger function whenever you change a row? PostgreSQL is *on it*; we love letting people write some random thing, and then running that random thing against your data as a side-effect of doing an operation. And if you like that...just wait until you learn about this half-assed rules feature we have too! And when the database breaks because the functions people inserted were garbage, that's their fault, not a cause for a CVE. And when someone blindly installs adminpack because it sounded like a pgAdmin requirement, lets a monitoring system run as root so it can watch pg_stat_activity, and then discovers that pair of reasonable decisions suddenly means any fool with monitoring access can call pg_file_unlink...that's their fault too. These are powerful tools with serious implications, and they're expected to be used by equally serious users. We as a development community do need to put a major amount of work into refactoring all of these security mechanisms. There should be less of these embarrassing incidents where bad software design really forced the insecure thing to happen, which I'd argue is the case for that pg_stat_activity example. And luckily so far development resources are appearing for organizations I know of working in that direction recently, as fast as the requirements are rising. I think there's a good outcome at the end of that road. But let's not act like RLS is a scary bogeyman because it introduces a new way to hack the server or get surprising side-effects. That's expected and possibly unavoidable behavior in a feature like this, and there are much worse instances of arbitrary function risk throughout the core code already. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to change the pgsql source code and build it??
Hello, I need to initialize the db as the root and start the database server. In order to accomplish this, I modified the initdb.c source file of pgsql package and tried to compile it. Eventhough the build was successful, I couldn't see the root user able to execute initdb executable generated by the build. I wanted to know if there is any other procedure for building the postgresql procedure? Thanks Shreesha. P.S Below is the changes done in initdb.c (shown in bold letters below) --- static char * get_id(void) { #ifndef WIN32 struct passwd *pw; // if (geteuid() == 0) /* 0 is root's uid */ */* {* *fprintf(stderr,* *_(%s: cannot be run as root\n* * Please log in (using, e.g., \su\) as the * * (unprivileged) user that will\n* * own the server process.\n),* *progname);* *exit(1);* *}* **/* ... }
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
Greg, all, I will reply to the emails in detail when I get a chance but am out of town at a funeral, so it'll likely be delayed. I did want to echo my agreement for the most part with Greg and in particular... On Thursday, June 12, 2014, Gregory Smith gregsmithpg...@gmail.com wrote: On 6/11/14, 10:26 AM, Robert Haas wrote: Now, as soon as we introduce the concept that selecting from a table might not really mean read from the table but read from the table after applying this owner-specified qual, we're opening up a whole new set of attack surfaces. Every pg_dump is an opportunity to hack somebody else's account, or at least audit their activity. I'm in full agreement we should clearly communicate the issues around pg_dump in particular, because they can't necessarily be eliminated altogether without some major work that's going to take a while to finish. And if the work-around is some sort of GUC for killing RLS altogether, that's ugly but not unacceptable to me as a short-term fix. A GUC which is enable / disable / error-instead may work quiet well, with error-instead for pg_dump default if people really want it (there would have to be a way to disable that though, imv). Note that enable is default in general, disable would be for superuser only (or on start-up) to disable everything, and error-instead anyone could use but it would error instead of implementing RLS when querying an RLS-enabled table. This approach was suggested by an existing user testing out this RLS approach, to be fair, but it looks pretty sane to me as a way to address some of these concerns. Certainly open to other ideas and thoughts though. Thanks, Stephen
Re: [HACKERS] How to change the pgsql source code and build it??
At 2014-06-12 16:08:05 -0700, shreesha1...@gmail.com wrote: I need to initialize the db as the root and start the database server. Why? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to change the pgsql source code and build it??
I need to port pgsql onto a controller which doesn't have a framework of creating multiple users for administrative purposes. The entire controller is managed by a single root user and that is the reason I am trying to change the pgsql initdb behavior. Do you think of any other better alternative? On Thu, Jun 12, 2014 at 5:40 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-06-12 16:08:05 -0700, shreesha1...@gmail.com wrote: I need to initialize the db as the root and start the database server. Why? -- Abhijit -- ~Shreesha.
Re: [HACKERS] How to change the pgsql source code and build it??
Hi, I need to port pgsql onto a controller which doesn't have a framework of creating multiple users for administrative purposes. The entire controller is managed by a single root user and that is the reason I am trying to change the pgsql initdb behavior. Do you think of any other better alternative? The reason you didn't see initdb completed is that it execs postgres on the way. As you know, it is strongly discourged on ordinary environment, but that framework sounds to be a single-user environment like what MS-DOS was, where any security risk comes from the characterisc is acceptable. I could see initdb and postgres operating as root for the moment (which means any possible side-effect is not checked) by making changes at four point in the whole postgresql source tree. Perhaps only two of them are needed for your wish. postgresql $ find . -type f -print | xargs grep -nH 'geteuid() == 0' ./src/backend/main/main.c:377: if (geteuid() == 0) ./src/bin/pg_ctl/pg_ctl.c:2121: if (geteuid() == 0) ./src/bin/initdb/initdb.c:778: if (geteuid() == 0) /* 0 is root's uid */ ./src/bin/pg_resetxlog/pg_resetxlog.c:250: if (geteuid() == 0) Try replacing these conditions with (0 geteuid() == 0) and you would see it run as root. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-12 00:38:36 -0400, Noah Misch wrote: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07 Hm. My guess it's that it's just a 'harmless' concurrency issue. The test currently run in concurrency with others: I think what happens is that the table gets dropped in the other relation after the query has acquired the mvcc snapshot (used for the pg_class) test. But why is it triggering on such a 'unusual' system and not on others? That's what worries me a bit. I can reproduce a similar disturbance in the test query using gdb and a concurrent table drop, and the table reported in the prairiedog failure is a table dropped in a concurrent test group. That explanation may not be the full story behind these particular failures, but it certainly could cause similar failures in the future. Let's prevent this by only reporting rows for relations that still exist after the query is complete. prairiedog is pretty damn slow by modern standards. OTOH, I think it is not the slowest machine in the buildfarm; hamster for instance seems to be at least a factor of 2 slower. So I'm not sure whether to believe it's just a timing issue. That kernel's process scheduler could be a factor. -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index a182176..a274d82 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -2375,14 +2375,18 @@ Check constraints: DROP TABLE alter2.tt8; DROP SCHEMA alter2; --- Check that we map relation oids to filenodes and back correctly. --- Only display bad mappings so the test output doesn't change all the --- time. +-- Check that we map relation oids to filenodes and back correctly. Only +-- display bad mappings so the test output doesn't change all the time. A +-- filenode function call can return NULL for a relation dropped concurrently +-- with the call's surrounding query, so check mappings only for relations +-- that still exist after all calls finish. +CREATE TEMP TABLE filenode_mapping AS SELECT oid, mapped_oid, reltablespace, relfilenode, relname FROM pg_class, pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS mapped_oid WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid; +SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid; oid | mapped_oid | reltablespace | relfilenode | relname -++---+-+- (0 rows) diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql index 3f641f9..19e1229 100644 --- a/src/test/regress/sql/alter_table.sql +++ b/src/test/regress/sql/alter_table.sql @@ -1582,15 +1582,20 @@ ALTER TABLE IF EXISTS tt8 SET SCHEMA alter2; DROP TABLE alter2.tt8; DROP SCHEMA alter2; --- Check that we map relation oids to filenodes and back correctly. --- Only display bad mappings so the test output doesn't change all the --- time. +-- Check that we map relation oids to filenodes and back correctly. Only +-- display bad mappings so the test output doesn't change all the time. A +-- filenode function call can return NULL for a relation dropped concurrently +-- with the call's surrounding query, so check mappings only for relations +-- that still exist after all calls finish. +CREATE TEMP TABLE filenode_mapping AS SELECT oid, mapped_oid, reltablespace, relfilenode, relname FROM pg_class, pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS mapped_oid WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid; +SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid; + -- Checks on creating and manipulation of user defined relations in -- pg_catalog. -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
Noah Misch n...@leadboat.com writes: On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-12 00:38:36 -0400, Noah Misch wrote: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07 Hm. My guess it's that it's just a 'harmless' concurrency issue. The test currently run in concurrency with others: I think what happens is that the table gets dropped in the other relation after the query has acquired the mvcc snapshot (used for the pg_class) test. But why is it triggering on such a 'unusual' system and not on others? That's what worries me a bit. I can reproduce a similar disturbance in the test query using gdb and a concurrent table drop, and the table reported in the prairiedog failure is a table dropped in a concurrent test group. That explanation may not be the full story behind these particular failures, but it certainly could cause similar failures in the future. Yeah, that seems like a plausible explanation, since the table shown in the failure report is one that would be getting dropped concurrently, and the discrepancy is that we get NULL rather than the expected value for the pg_filenode_relation result, which is expected if the table is already dropped when the mapping function is called. Let's prevent this by only reporting rows for relations that still exist after the query is complete. I think this is a bad solution though; it risks masking actual problems. What seems like a better fix to me is to change the test mapped_oid IS DISTINCT FROM oid to mapped_oid oid pg_class.oid will certainly never read as NULL, so what this will do is allow the single case where the function returns NULL. AFAIK there is no reason to suppose that a NULL result would mean anything except the table's been dropped, so changing it this way will allow only that case and not any others. Alternatively, we could do something like you suggest but adjust the second join so that it suppresses only rows in which mapped_oid is null *and* there's no longer a matching OID in pg_class. That would provide additional confidence that the null result is a valid indicator of a just-dropped table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq
I wrote: Robert Haas robertmh...@gmail.com writes: Presumably we should also fix libpq to not be so dumb. I mean, it doesn't help with the immediate problem, since as you say there could be non-upgraded copies of libpq out there for the indefinite future, but it still seems like something we oughta fix. It's been in the back of my mind for awhile that doing a dynamic query at all here is pretty pointless. It's not like the OIDs of those functions ever have or ever will move. It would probably be more robust if we just let libpq be a consumer of fmgroids.h and refer directly to the constants F_LO_CREATE etc. I thought a bit more about this. Ignore for the moment the larger question of whether we want to consider fmgroids.h as something we'd export to clients outside the immediate core infrastructure; that will definitely take more thought than we can expend if we want to slip this into 9.4. It still seems reasonable for libpq to use it. The actual code changes in fe-lobj.c are trivial enough (and would consist mostly of code removal). We would need to deal with the fact that some of the support functions are not present in older backends, but I think testing PQserverVersion is adequate for that purpose. The hard part seems to be making sure that fmgroids.h is available to reference, since it's a generated file and not guaranteed to be there a-priori. In a gmake-driven build I have the skillz to deal with that, but I am not sure what to do in the various Windows build systems, especially for the client-code-only build methods. The path of least resistance would be to just assume fmgroids.h is available, which would work fine when building from a tarball, or probably when doing a full build including backend (MSVC builds aren't parallel are they?). But what about a client-only build? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] unable to attach client process to postgresql server using gdb
hi, I am working with PostgreSQL 9.3.4 source using Eclipse IDE in ubuntu 14.04. I am facing a problem in attaching client process to postgresql server using gdb to debug. When I start the postmaster then I connect to it from client on a terminal. It works fine. Queries get responses. When I run debug config from eclipse then select postgres process id from list I get error saying *Can't find a source file at /build/buildd/eglibc-2.19/socket/../sysdeps/unix/sysv/linux/x86_64/recv.c * *Locate the file or edit the source lookup path to include its location.* After this when I send any query from client, it just stucks. No response comes. After attaching gdb to client process, client does not get any response from postgres server. One thing to note is that I was able to debug properly till yesterday. But now it is not working. I tried reinstalling but did not help. How could I fix this issue? Kindly help. Rajmohan