Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
Peter Geoghegan wrote: Some of you will be aware that I've tried to formulate a good general recommendation for setting the value of commit_delay, in light of the improvements made in 9.3. I previously asked for help finding a methodology for optimising commit_delay [1]. The documentation related to commit_delay still says that we don't know what might work well, though I don't think that's true any more. I found the time to do some benchmarks of my own - Greg Smith made available a server that he frequently uses for benchmarks. It was previously my observation that half of raw single-page sync time as reported by pg_test_fsync for you wal_sync_method worked best for commit_delay. I went so far as to modify pg_test_fsync to output average time per op in microseconds for each operation with commit_delay in mind, which is a patch that has already been committed [2]. [...] Attached is a doc-patch that makes recommendations that are consistent with my observations about what works best here. I'd like to see us making *some* recommendation - for sympathetic cases, setting commit_delay appropriately can make a very large difference to transaction throughput. Such sympathetic cases - many small write transactions - are something that tends to be seen relatively frequently with web applications, that disproportionately use cloud hosting. It isn't at all uncommon for these cases to be highly bound by their commit rate, and so it is compelling to try to amortize the cost of a flush as effectively as possible there. It would be unfortunate if no one was even aware that commit_delay is now useful for these cases, since the setting allows cloud hosting providers to help these cases quite a bit, without having to do something like compromise durability, which in general isn't acceptable. The point of all these benchmarks isn't to show how effective commit_delay now is, or can be - we already had that discussion months ago, before the alteration to its behaviour was committed. The point is to put my proposed doc changes in the appropriate context, so that I can build confidence that they're balanced and helpful, by showing cases that are not so sympathetic. Thoughts? If there is an agreement that half the sync time as reported by pg_test_fsync is a good value, would it make sense to have initdb test sync time and preset commit_delay? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
On 11/15/2012 03:44 PM, Darren Duncan wrote: As they currently exist, triggers always fire based on certain SQL syntax used, rather than on the semantics of what is actually going on. That's not quite right. COPY fires INSERT triggers, despite never using an explicit INSERT statement. There are probably other examples. As a simple example, I'd like to be able to define a trigger like AFTER DELETE ON foo FOR EACH ROW and have that trigger be invoked not only by a DELETE on foo but also by a TRUNCATE on foo. So I would like to do some auditing action when a row of foo is deleted, no matter how it happens. TRUNCATE triggers already exist. If you wanted you could have a BEFORE TRUNCATE trigger SELECT each row and invoke a function on each row, same as a FOR EACH ROW .. DELETE trigger. So you can already achieve what you want to do with the currently available features. The reason this particular example in particular is important is that TRUNCATE is documented as a data-manipulation action semantically equivalent to an unqualified DELETE in its effects, primarily. I'd say we should just change the docs to note that TRUNCATE does not act on each row individually, so it does not fire DELETE triggers. There is a TRUNCATE trigger you can use to monitor and control TRUNCATE behaviour. So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW, but I'm also proposing the ability to generally define triggers based not on the syntax used but the actual action requested. That already exists. Maybe you're using some old version? regress= \h CREATE TRIGGER Command: CREATE TRIGGER Description: define a new trigger Syntax: CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] } ON table_name ... where event can be one of: INSERT UPDATE [ OF column_name [, ... ] ] DELETE TRUNCATE A tangential feature request is to provide a runtime config option that can cause TRUNCATE to always behave as unqualified DELETE FROM regardless of any triggers, as if it were just a syntactic shorthand. Please, no! If you want to prevent TRUNCATE, deny the privilege or add a trigger that aborts the command. TRUNCATE should do what it says, not magically be changed to do something else by a GUC. Or alternately/also provide extra syntax to TRUNCATE itself where one can specify which behavior to have, and both options can be given explicitly to override any config option. What advantage would that have over using DELETE when you want DELETE, and TRUNCATE when you want TRUNCATE? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
On 11/15/12 12:19 AM, Albe Laurenz wrote: If there is an agreement that half the sync time as reported by pg_test_fsync is a good value, would it make sense to have initdb test sync time and preset commit_delay? Peter has validated this against a good range of systems, but it would be optimistic to say it's a universal idea. My main concern with this would be the relatively common practice of moving the pg_xlog directory after initdb time. Sometimes people don't know about the option to have initdb move it. Sometimes the drive to hold pg_xlog isn't available when the database is originally created yet. And the camp I fall into (which admittedly is the group who can take care of this on their own) will move pg_xlog manually and symlink it on their own, because that's what we're used to. I would rather see this just turn into one of the things a more general tuning tool knew how to do, executing against a fully setup system. Having a useful implementation of commit_delay and useful docs on it seems like enough of a jump forward for one release. Moving fully into auto-tuning before getting more field feedback on how that works out is pretty aggressive. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
On 11/15/2012 04:56 PM, Greg Smith wrote: I would rather see this just turn into one of the things a more general tuning tool knew how to do, executing against a fully setup system. Having a useful implementation of commit_delay and useful docs on it seems like enough of a jump forward for one release. Moving fully into auto-tuning before getting more field feedback on how that works out is pretty aggressive. It'll also potentially make it harder to get reproducible results in benchmarking and testing across repeated runs, cause confusion when someone relocates a DB or changes hardware, and slow down initdb (and thus testing). I'd be all for making it part of a test my hardware and tune my DB tool, but not such a fan of doing it at initdb time. Making initdb less predictable and more complicated sounds like asking for trouble. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
Hi, On Thursday, November 15, 2012 05:08:26 AM Michael Paquier wrote: Looks like cool stuff @-@ I might be interested in looking at that a bit as I think I will hopefully be hopefully be able to grab some time in the next couple of weeks. Are some of those patches already submitted to a CF? I added the patchset as one entry to the CF this time, it seems to me they are too hard to judge individually to make them really separately reviewable. I can split it off there, but really all the complicated stuff is in one patch anyway... Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Timing events WIP v1
Attached is a first WIP saving Timing Events via a new hook, grabbed by a new pg_timing_events contrib module. This only implements a small portion of the RFP spec I presented earlier this month, and is by no means finished code looking for a regular review. This is just enough framework to do something useful while looking for parts that I suspect have tricky details. Right now this saves just a single line of text and inserts the hook into the checkpoint code, so output looks like this: $ psql -c select * from pg_timing_events -x -[ RECORD 1 ]- seq | 0 event | check/restartpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s Extending this to save the key/value set and most of the other data I mentioned before is pretty straightforward. There is a fair amount of debris in here from the pg_stat_statements code this started as. I've left a lot of that here but commented out because it gives examples of grabbing things like the user and database I'll need soon. There's a modest list of things that I've done or am facing soon that involve less obvious choices though, and I would appreciate feedback on these things in particular: -This will eventually aggregate data from clients running queries, the checkpointer process, and who knows what else in the future. The most challenging part of this so far is deciding how to handle the memory management side. I've tried just dumping everything into TopMemoryContext, and that not working as hoped is the biggest known bug right now. I'm not decided on whether to continue doing that but resolve the bug; if there is an alternate context that makes more sense for a contrib module like this; or if I should fix the size of the data and allocate everything at load time. The size of the data from log_lock_waits in particular will be hard to estimate in advance. -The event queue into a simple array accessed in circular fashion. After realizing that output needed to navigate in the opposite order of element addition, I ended up just putting all the queue navigation code directly into the add/output routines. I'd be happy to use a more formal Postgres type here instead--I looked at SHM_QUEUE for example--but I haven't found something yet that makes this any simpler. -I modeled the hook here on the logging one that went into 9.2. It's defined in its own include file and it gets initialized by the logging system. No strong justification for putting it there, it was just similar to the existing hook and that seemed reasonable. This is in some respects an alternate form of logging after all. The data sent by the hook itself needs to be more verbose in order to handle the full spec, right now I'm just asking about placement. -I'm growing increasingly worried about allowing concurrent reads of this data (which might be large and take a while to return to the client) without blocking insertions. The way that was split to improve pg_stat_statements concurrency doesn't convert naturally to this data. The serial number field in here is part of one idea I had. I might grab the header with an exclusive lock for only a moment and lookup the serial number of the last record I should display. Then I could drop to a share lock looping over the elements, stopping if I find one with a serial number over that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/pg_timing_events/Makefile b/contrib/pg_timing_events/Makefile new file mode 100644 index 000..3e7f1ab --- /dev/null +++ b/contrib/pg_timing_events/Makefile @@ -0,0 +1,18 @@ +# contrib/pg_timing_events/Makefile + +MODULE_big = pg_timing_events +OBJS = pg_timing_events.o + +EXTENSION = pg_timing_events +DATA = pg_timing_events--1.0.sql pg_timing_events--unpackaged--1.0.sql + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_timing_events +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_timing_events/pg_timing_events--1.0.sql b/contrib/pg_timing_events/pg_timing_events--1.0.sql new file mode 100644 index 000..ccf6372 --- /dev/null +++ b/contrib/pg_timing_events/pg_timing_events--1.0.sql @@ -0,0 +1,43 @@ +/* contrib/pg_timing_events/pg_timing_events--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use CREATE EXTENSION pg_timing_events to load this file. \quit + +-- Register functions. +CREATE FUNCTION pg_timing_events_reset() +RETURNS void +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE FUNCTION pg_timing_events( + OUT seq int8, +OUT event text +--OUT occured timestamp + +--OUT userid oid,
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
On Thu, Nov 15, 2012 at 9:56 AM, Greg Smith g...@2ndquadrant.com wrote: On 11/15/12 12:19 AM, Albe Laurenz wrote: If there is an agreement that half the sync time as reported by pg_test_fsync is a good value, would it make sense to have initdb test sync time and preset commit_delay? Peter has validated this against a good range of systems, but it would be optimistic to say it's a universal idea. My main concern with this would be the relatively common practice of moving the pg_xlog directory after initdb time. Sometimes people don't know about the option to have initdb move it. Sometimes the drive to hold pg_xlog isn't available when the database is originally created yet. And the camp I fall into (which admittedly is the group who can take care of this on their own) will move pg_xlog manually and symlink it on their own, because that's what we're used to. An even more common usecase for this, I think, is I installed from a package that ran initdb for me.. I still think manual moving of pg_xlog is a lot more common than using the initdb option in general. I would rather see this just turn into one of the things a more general tuning tool knew how to do, executing against a fully setup system. Having a useful implementation of commit_delay and useful docs on it seems like enough of a jump forward for one release. Moving fully into auto-tuning before getting more field feedback on how that works out is pretty aggressive. +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
A row-level trigger for TRUNCATE does not really make sense, as it would mean that TRUNCATE needs to scan each tuple of the table it needs to interact with to fire its trigger, so it would more or less achieve the same performance as a plain DELETE FROM table;. TRUNCATE is performant because it only removes the tuples, and does not care about scanning. I am also not sure it is meant for scanning (btw there are discussions about TRUNCATE these days so I might be missing something). So, what you are looking for can be simply solved with row-level triggers on DELETE, so why not using that and avoid reinventing the wheel? -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] feature proposal - triggers by semantics
On 11/15/2012 09:48 AM, Craig Ringer wrote: If you want to prevent TRUNCATE, deny the privilege or add a trigger that aborts the command. You can abort the transaction but not skip action as currently it is only possible to skip in ROW level triggers. So I'd modify this request to allow BEFORE EACH STATEMENT triggers to also be able to silently skip current action like BEFORE EACH ROW triggers can. Then this request would simply be satisfied by a simple trigger which rewrites TRUNCATE into DELETE . Hannu TRUNCATE should do what it says, not magically be changed to do something else by a GUC. Or alternately/also provide extra syntax to TRUNCATE itself where one can specify which behavior to have, and both options can be given explicitly to override any config option. What advantage would that have over using DELETE when you want DELETE, and TRUNCATE when you want TRUNCATE? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 11/14/12 6:28 PM, Kevin Grittner wrote: - Documentation is incomplete. ... - There are no regression tests yet. Do you have any simple test cases you've been using you could attach? With epic new features like this, when things don't work it's hard to distinguish between that just isn't implemented yet and the author never tested that. Having some known good samples you have tested, even if they're not proper regression tests, would be helpful for establishing the code baseline works. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote: On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma atri.j...@gmail.com wrote: On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote: Following the sig is a first cut at a patch (written by Atri) that PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid. In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario's mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. Hackers, any opinion/suggestions about the use case? With Regards, Amit Kapila.
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
On 15 November 2012 10:04, Magnus Hagander mag...@hagander.net wrote: I would rather see this just turn into one of the things a more general tuning tool knew how to do, executing against a fully setup system. Having a useful implementation of commit_delay and useful docs on it seems like enough of a jump forward for one release. Moving fully into auto-tuning before getting more field feedback on how that works out is pretty aggressive. +1. I am inclined to agree. I did attempt to use the instrumentation macros to have commit_delay set adaptively at runtime, which would have at least addressed this concern, but that just didn't work as well as this. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
On 11/15/2012 06:25 PM, Hannu Krosing wrote: On 11/15/2012 09:48 AM, Craig Ringer wrote: If you want to prevent TRUNCATE, deny the privilege or add a trigger that aborts the command. You can abort the transaction but not skip action as currently it is only possible to skip in ROW level triggers. So I'd modify this request to allow BEFORE EACH STATEMENT triggers to also be able to silently skip current action like BEFORE EACH ROW triggers can. Then this request would simply be satisfied by a simple trigger which rewrites TRUNCATE into DELETE . That seems sensible to me, too. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
Here's an updated version of this patch, rebased with master, including the recent replication timeout changes, and some other cleanup. On 12.10.2012 09:34, Amit Kapila wrote: The test is finished from myside. one more issue: ... ./pg_basebackup -P -D ../../data_sub -X fetch -p 2303 pg_basebackup: COPY stream ended before last file was finished Fixed this. However, the test scenario you point to here: http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@kap...@huawei.com still seems to be broken, although I get a different error message now. I'll dig into this.. - Heikki streaming-tli-switch-5.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
On 15 November 2012 08:56, Greg Smith g...@2ndquadrant.com wrote: My main concern with this would be the relatively common practice of moving the pg_xlog directory after initdb time. I probably should have increased the default number of seconds that pg_test_fsync runs for, in light of the fact that that can make an appreciable difference when following this method. I'd suggest that a value of 5 be used there. I think we should just change that, since everyone will just use the default anyway (or, the more cautious ones will use a higher value than the default, if anything). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 10.10.2012 17:26, Amit Kapila wrote: 36.+SendTimeLineHistory(TimeLineHistoryCmd *cmd) { .. if (nread= 0) +ereport(ERROR, +(errcode_for_file_access(), + errmsg(could not read file \%s\: %m, +path))); FileClose should be done in error case as well. Hmm, I think you're right. The straightforward fix to just call FileClose() before the ereport()s in that function would not be enough, though. You might run out of memory in pq_sendbytes(), for example, which would throw an error. We could use PG_TRY/CATCH for this, but seems like overkill. Perhaps the simplest fix is to use a global (static) variable for the fd, and clean it up in WalSndErrorCleanup(). This is a fairly general issue, actually. Looking around, I can see at least two similar cases in existing code, with BasicOpenFile, where we will leak file descriptors on error: copy_file: there are several error cases, including out-of-disk space, with no attempt to close the fds. XLogFileInit: again, no attempt to close the file descriptor on failure. This is called at checkpoint from the checkpointer process, to preallocate new xlog files. Given that we haven't heard any complaints of anyone running into these, these are not a big deal in practice, but in theory at least the XLogFileInit leak could lead to serious problems, as it could cause the checkpointer to run out of file descriptors. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 15.11.2012 12:44, Heikki Linnakangas wrote: Here's an updated version of this patch, rebased with master, including the recent replication timeout changes, and some other cleanup. On 12.10.2012 09:34, Amit Kapila wrote: The test is finished from myside. one more issue: ... ./pg_basebackup -P -D ../../data_sub -X fetch -p 2303 pg_basebackup: COPY stream ended before last file was finished Fixed this. However, the test scenario you point to here: http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@kap...@huawei.com still seems to be broken, although I get a different error message now. I'll dig into this.. Ok, here's an updated patch again, with that bug fixed. - Heikki streaming-tli-switch-6.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Thursday, November 15, 2012 6:05 PM Heikki Linnakangas wrote: On 15.11.2012 12:44, Heikki Linnakangas wrote: Here's an updated version of this patch, rebased with master, including the recent replication timeout changes, and some other cleanup. On 12.10.2012 09:34, Amit Kapila wrote: The test is finished from myside. one more issue: ... ./pg_basebackup -P -D ../../data_sub -X fetch -p 2303 pg_basebackup: COPY stream ended before last file was finished Fixed this. However, the test scenario you point to here: http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e77 10$@kap...@huawei.com still seems to be broken, although I get a different error message now. I'll dig into this.. Ok, here's an updated patch again, with that bug fixed. I shall review and test the updated Patch in Commit Fest. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On Tue, Oct 2, 2012 at 8:22 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 2 October 2012 18:16, Tom Lane t...@sss.pgh.pa.us wrote 1. Why isn't something like md5() on the reported query text an equally good solution for users who want a query hash? Because that does not uniquely identify the entry. The very first thing that the docs say on search_path is Qualified names are tedious to write, and it's often best not to wire a particular schema name into applications anyway. Presumably, the reason it's best not to wire schema names into apps is because it might be useful to modify search_path in a way that dynamically made the same queries in some application reference what are technically distinct relations. If anyone does this, and it seems likely that many do for various reasons, they will be out of luck when using some kind of pg_stat_statements aggregation. This was the behaviour that I intended for pg_stat_statements all along, and I think it's better than a solution that matches query strings. 2. If people are going to accumulate stats on queries over a long period of time, is a 32-bit hash really good enough for the purpose? If I'm doing the math right, the chance of collision is already greater than 1% at 1 queries, and rises to about 70% for 10 queries; see http://en.wikipedia.org/wiki/Birthday_paradox We discussed this issue and decided it was okay for pg_stat_statements's internal hash table, but it's not at all clear to me that it's sensible to use 32-bit hashes for external accumulation of query stats. Well, forgive me for pointing this out, but I did propose that the hash be a 64-bit value (which would have necessitated adopting hash_any() to produce 64-bit values), but you rejected the proposal. I arrived at the same probability for a collision as you did and posted in to the list, in discussion shortly after the normalisation stuff was committed. What was the argument for rejecting it? Just that it required hash_any() to be adapted? Now that we have a very clear use case where this would help, perhaps it's time to re-visit this proposal? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 15 November 2012 13:10, Magnus Hagander mag...@hagander.net wrote: Well, forgive me for pointing this out, but I did propose that the hash be a 64-bit value (which would have necessitated adopting hash_any() to produce 64-bit values), but you rejected the proposal. I arrived at the same probability for a collision as you did and posted in to the list, in discussion shortly after the normalisation stuff was committed. What was the argument for rejecting it? Just that it required hash_any() to be adapted? I think so, yes. It just wasn't deemed necessary. Note that hash_any() has a comment that says something like if you want to adapt this so that the Datum returned is a 64-bit integer, the way to do that is I followed this advice in a revision of the pg_stat_statements normalisation patch, and preserved compatibility by using a macro, which Tom objected to. Now that we have a very clear use case where this would help, perhaps it's time to re-visit this proposal? Perhaps. I think that the issue of whether or not a 64-bit integer is necessary is totally orthogonal to the question of whether or not we should expose the hash. The need to aggregate historical statistics just doesn't appreciably alter things here, I feel. The number of discrete queries that an application will execute in a week just isn't that different from the number that it will ever execute, I suspect. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Are you planning to introduce the timeout mechanism in pg_basebackup main process? Or background process? It's useful to implement both. By background process, you mean ReceiveXlogStream? For both. I think for background process, it can be done in a way similar to what we have done for walreceiver. Yes. But I have some doubts for how to do for main process: Logic similar to walreceiver can not be used incase network goes down during getting other database file from server. The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting for infinite time till it gets some data. In order to solve this issue, I can think of following options: 1. Making this call also asynchronous (but now sure about impact of this). +1 Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can solve the issue in the similar way to walreceiver's. 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This time can be received as command line argument from respective utility and set the same in PGconn structure. Yes, I think that we should add something like --conninfo option to pg_basebackup and pg_receivexlog. We can easily set not only connect_timeout but also sslmode, application_name, ... by using such option accepting conninfo string. I have prepared an attached patch to make pg_basebackup and pg_receivexlog as non-blocking. To do so I have to add new command line parameters in pg_basebackup and pg_receivexlog for now added two more command line arguments a. -r for pg_basebackup and pg_receivexlog to take receive time-out value. Default value for this parameter is 60 sec. b. -t for pg_basebackup and pg_receivexlog to take initial connection timeout value. Default value is infinite wait. We can change to accept --conninfo as well. I feel apart from above, remaining problem is for function call PQgetResult() 1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query. As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself, then there will not be any result, so it will keep hanging in PQgetResult . IMO, it can be solved in below ways: a. Create one corresponding non-blocking function. But this function is being called from inside some of the other libpq function (PQexec-PQexecFinish-PQgetResult). So it can be little tricky to solve this way. b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set. c. any other better way? BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. What about using pq_putmessage_noblock()? I think may be some more functions also needs to be made as noblock. I am still evaluating. I will upload the attached patch in commitfest if you don't have any objections? More Suggestions/Comments? With Regards, Amit Kapila. noblock_basebackup_and_receivexlog.patch Description: noblock_basebackup_and_receivexlog.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Patch to compute Max LSN of Data Pages
On Thu, Nov 15, 2012 at 12:08 AM, Amit Kapila amit.kap...@huawei.com wrote: Okay. So as Robert and Alvaro suggested to have it separate utility rather than having options in pg_resetxlog to print MAX LSN seems to be quite appropriate. I am planning to update the patch to make it a separate utility as pg_computemaxlsn with options same as what I have proposed for pg_resetxlog to print MAX LSN. So considering it a separate utility there can be 2 options: a. have a utility in contrib. b. have a utility in bin similar to pg_resetxlog I guess I'd vote for contrib, but I wouldn't be crushed if it went the other way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/14] Store the number of subtransactions in xl_running_xacts separately from toplevel xids
On 14 November 2012 22:17, Andres Freund and...@2ndquadrant.com wrote: To avoid complicating logic we store both, the toplevel and the subxids, in -xip, first -xcnt toplevel ones, and then -subxcnt subxids. That looks good, not much change. Will apply in next few days. Please add me as committer and mark ready. Also skip logging any subxids if the snapshot is suboverflowed, they aren't useful in that case anyway. This allows to make some operations cheaper and it allows faster startup for the future logical decoding feature because that doesn't care about subtransactions/suboverflow'edness. ...but please don't add extra touches of Andres magic along the way. Doing that will just slow down patch acceptance and its not important. I suggest to keep note of things like that and come back to them later. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/14] Store the number of subtransactions in xl_running_xacts separately from toplevel xids
On 2012-11-15 09:07:23 -0300, Simon Riggs wrote: On 14 November 2012 22:17, Andres Freund and...@2ndquadrant.com wrote: To avoid complicating logic we store both, the toplevel and the subxids, in -xip, first -xcnt toplevel ones, and then -subxcnt subxids. That looks good, not much change. Will apply in next few days. Please add me as committer and mark ready. Cool. Will do. Also skip logging any subxids if the snapshot is suboverflowed, they aren't useful in that case anyway. This allows to make some operations cheaper and it allows faster startup for the future logical decoding feature because that doesn't care about subtransactions/suboverflow'edness. ...but please don't add extra touches of Andres magic along the way. Doing that will just slow down patch acceptance and its not important. I suggest to keep note of things like that and come back to them later. Which magic are you talking about? Only two parts changed in comparison to the previous situation. One is that the following in ProcArrayApplyRecoveryInfo only applies to toplevel transactions by virtue of -xcnt now only containing the toplevel transaction count: + /* +* Remove stale locks, if any. +* +* Locks are always assigned to the toplevel xid so we don't need to care +* about subxcnt/subxids (and by extension not about -suboverflowed). +*/ StandbyReleaseOldLocks(running-xcnt, running-xids); Note that there was no code change, just a change in meaning. The other part is: + /* +* Spin over procArray collecting all subxids, but only if there hasn't +* been a suboverflow. +*/ + if (!suboverflowed) Well, thats something that basically had to be decided either way when writing the patch... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
On 15.11.2012 03:17, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - separating reader/writer I'm disappointed to see that there has been no progress on this patch since last commitfest. I thought we agreed on the approach I championed for here: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00636.php. There wasn't much work left to finish that, I believe. Are you going to continue working on this? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add -Wlogical-op to standard compiler options?
Peter Eisentraut pete...@gmx.net writes: I think it might be worth adding -Wlogical-op to the standard warning options (for supported compilers, determined by configure test). Does that add any new warnings with the current source code, and if so what? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
On 2012-11-15 16:22:56 +0200, Heikki Linnakangas wrote: On 15.11.2012 03:17, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - separating reader/writer I'm disappointed to see that there has been no progress on this patch since last commitfest. I thought we agreed on the approach I championed for here: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00636.php. There wasn't much work left to finish that, I believe. While I still think my approach is superior at this point I have accepted that I haven't convinced anybody of that. I plan to port over what I have submitted to Alvaro's version of your patch. I have actually started that but I simply couldn't finish it in time. The approach for porting I took didn't work all that well and I plan to restart doing that after doing some review work. Are you going to continue working on this? this being my version of XlogReader? No. The patch above is unchanged except some very minor rebasing to recent wal changes by Tom. The reason its included in the series is simply that I haven't gotten rid of it yet and the subsequent patches needed it. I do plan to continue working on a rebased xlogdump version if nobody beats me to it (please do beat me!). Ok? The cover letter said: * Add support for a generic wal reading facility dubbed XLogReader There's some discussion about whats the best way to implement this in a separate CF topic. (unchanged) I should have folded that in into the patch description, sorry. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On 23.10.2012 00:29, Alvaro Herrera wrote: Here's an updated version of this patch, which also works in an EXEC_BACKEND environment. (I haven't tested this at all on Windows, but I don't see anything that would create a portability problem there.) Looks good at first glance. Fails on Windows, though: C:\postgresql\pgsql.sln (default target) (1) - C:\postgresql\auth_counter.vcxproj (default target) (29) - (Link target) - auth_counter.obj : error LNK2001: unresolved external symbol UnBlockSig [C:\p ostgresql\auth_counter.vcxproj] .\Release\auth_counter\auth_counter.dll : fatal error LNK1120: 1 unresolved externals [C:\postgresql\auth_counter.vcxproj] C:\postgresql\pgsql.sln (default target) (1) - C:\postgresql\worker_spi.vcxproj (default target) (77) - worker_spi.obj : error LNK2001: unresolved external symbol UnBlockSig [C:\pos tgresql\worker_spi.vcxproj] .\Release\worker_spi\worker_spi.dll : fatal error LNK1120: 1 unresolved externals [C:\postgresql\worker_spi.vcxproj] Marking UnBlockSig with PGDLLIMPORT fixes that. But I wonder if it's a good idea to leave unblocking signals the responsibility of the user code in the first place? That seems like the kind of low-level stuff that you want to hide from extension writers. Oh, and this needs docs. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
Heikki Linnakangas wrote: On 15.11.2012 03:17, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - separating reader/writer I'm disappointed to see that there has been no progress on this patch since last commitfest. I thought we agreed on the approach I championed for here: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00636.php. There wasn't much work left to finish that, I believe. Are you going to continue working on this? I worked a bit more on that patch of yours, but I neglected to submit it. Did you have something in particular that you wanted changed in it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
Heikki Linnakangas hlinnakan...@vmware.com writes: This is a fairly general issue, actually. Looking around, I can see at least two similar cases in existing code, with BasicOpenFile, where we will leak file descriptors on error: Um, don't we automatically clean those up during transaction abort? If we don't, we ought to think about that, not about cluttering calling code with certain-to-be-inadequate cleanup in error cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Wednesday, November 14, 2012 12:24 AM Robert Haas wrote: On Mon, Nov 12, 2012 at 10:59 PM, Amit kapila amit.kap...@huawei.com wrote: Uh, no, I don't think that's a good idea. IMHO, what we should do is: 1. Read postgresql.conf.auto and remember all the settings we saw. If we see something funky like an include directive, barf. 2. Forget the value we remembered for the particular setting being changed. Instead, remember the user-supplied new value for that parameter. 3. Write a new postgresql.conf.auto based on the information remembered in steps 1 and 2. Attached patch contains implementaion for above concept. It can be changed to adapt the write file based on GUC variables as described by me yesterday or in some other better way. Currenty to remember and forget, I have used below concept: 1. Read postgresql.auto.conf in memory. 2. parse the GUC_file for exact loction of changed variable 3. update the changed variable in memory at location found in step-2 4. Write the postgresql.auto.conf Overall changes: 1. include dir in postgresql.conf at time of initdb 2. new built-in function pg_change_conf to change configuration settings 3. write file as per logic described above. Some more things left are: 1. Transactional capability to command, so that rename of .lock file to .auto.conf can be done at commit time. I am planing to upload the attached for this CF. Suggestions/Comments? With Regards, Amit Kapila. pg_change_conf.patch Description: pg_change_conf.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Heikki Linnakangas escribió: On 23.10.2012 00:29, Alvaro Herrera wrote: Here's an updated version of this patch, which also works in an EXEC_BACKEND environment. (I haven't tested this at all on Windows, but I don't see anything that would create a portability problem there.) Looks good at first glance. Thanks. Fails on Windows, though: C:\postgresql\pgsql.sln (default target) (1) - C:\postgresql\auth_counter.vcxproj (default target) (29) - (Link target) - auth_counter.obj : error LNK2001: unresolved external symbol UnBlockSig [C:\p ostgresql\auth_counter.vcxproj] .\Release\auth_counter\auth_counter.dll : fatal error LNK1120: 1 unresolved externals [C:\postgresql\auth_counter.vcxproj] Wow. If that's the only problem it has on Windows, I am extremely pleased. Were you able to test the provided test modules? Only now I realise that they aren't very friendly because there's a hardcoded database name in there (alvherre, not the wisest choice I guess), but they should at least be able to run and not turn into a fork bomb due to being unable to connect, for instance. Marking UnBlockSig with PGDLLIMPORT fixes that. But I wonder if it's a good idea to leave unblocking signals the responsibility of the user code in the first place? That seems like the kind of low-level stuff that you want to hide from extension writers. Sounds sensible. I am unsure about the amount of pre-cooked stuff we need to provide. For instance, do we want some easy way to let the user code run transactions? Oh, and this needs docs. Hmm, yes it does. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: Comments? Suggestions? It could use a run through pgindent. And the header comments are separated by a blank line from the functions to which they are attached, which is not project style. + if (heap-size + 1 == heap-space) + Assert(binary heap is full); This doesn't really make any sense. Assert(binary heap is full) will never fail, because that expression is a character string which is, therefore, not NULL. I think you want Assert(heap-size heap-space). Or you could use (heap-size + 1 = heap-space) elog(LEVEL, message) if there's some reason this needs to be checked in non-debug builds. + { + sift_down(heap, i); + } Project style is to omit braces for a single-line body. This comes up a few other places as well. Other than the coding style issues, I think this looks fine. If you can fix that up, I believe I could go ahead and commit this, unless anyone else objects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
On 2012-11-15 11:50:37 -0300, Alvaro Herrera wrote: Heikki Linnakangas wrote: On 15.11.2012 03:17, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - separating reader/writer I'm disappointed to see that there has been no progress on this patch since last commitfest. I thought we agreed on the approach I championed for here: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00636.php. There wasn't much work left to finish that, I believe. Are you going to continue working on this? I worked a bit more on that patch of yours, but I neglected to submit it. Did you have something in particular that you wanted changed in it? Could you push your newest version to your git repository or similar? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 15.11.2012 16:55, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: This is a fairly general issue, actually. Looking around, I can see at least two similar cases in existing code, with BasicOpenFile, where we will leak file descriptors on error: Um, don't we automatically clean those up during transaction abort? Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at abort. If we don't, we ought to think about that, not about cluttering calling code with certain-to-be-inadequate cleanup in error cases. Agreed. Cleaning up at end-of-xact won't help walsender or other non-backend processes, though, because they don't do transactions. But a top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine would work. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
Andres Freund wrote: On 2012-11-15 11:50:37 -0300, Alvaro Herrera wrote: I worked a bit more on that patch of yours, but I neglected to submit it. Did you have something in particular that you wanted changed in it? Could you push your newest version to your git repository or similar? Sadly, I cannot, because I had it on my laptop only and its screen died this morning (well, actually it doesn't boot at all, so I can't use the external screen either). I'm trying to get it fixed as soon as possible but obviously I have no idea when I will be able to get it back. Most likely I will have to go out and buy a 2.5 drive enclosure to get the valuable stuff out of it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
Robert Haas escribió: On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: Comments? Suggestions? It could use a run through pgindent. And the header comments are separated by a blank line from the functions to which they are attached, which is not project style. Also there are some comments in the .h file which we typically don't carry. Other than the coding style issues, I think this looks fine. If you can fix that up, I believe I could go ahead and commit this, unless anyone else objects. Does this include the changes in nodeMergeappend.c? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On 15.11.2012 17:10, Alvaro Herrera wrote: Heikki Linnakangas escribió: On 23.10.2012 00:29, Alvaro Herrera wrote: Here's an updated version of this patch, which also works in an EXEC_BACKEND environment. (I haven't tested this at all on Windows, but I don't see anything that would create a portability problem there.) Looks good at first glance. Thanks. Fails on Windows, though: C:\postgresql\pgsql.sln (default target) (1) - C:\postgresql\auth_counter.vcxproj (default target) (29) - (Link target) - auth_counter.obj : error LNK2001: unresolved external symbol UnBlockSig [C:\p ostgresql\auth_counter.vcxproj] .\Release\auth_counter\auth_counter.dll : fatal error LNK1120: 1 unresolved externals [C:\postgresql\auth_counter.vcxproj] Wow. If that's the only problem it has on Windows, I am extremely pleased. Were you able to test the provided test modules? I tested the auth_counter module, seemed to work. It counted all connections as successful, though, even when I tried to log in with an invalid username/database. Didn't try with an invalid password. And I didn't try worker_spi. I am unsure about the amount of pre-cooked stuff we need to provide. For instance, do we want some easy way to let the user code run transactions? Would be nice, of course. I guess it depends on how much work would it be provide that. But we can leave that for later, once the base patch is in. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
Assert(!description of error) is an idiom I've seen more than once, although I'm not sure I understand why it's not written as Robert says with the condition in the brackets (or as a print to STDERR followed by abort() instead). On 15 November 2012 15:11, Robert Haas robertmh...@gmail.com wrote: On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: Comments? Suggestions? It could use a run through pgindent. And the header comments are separated by a blank line from the functions to which they are attached, which is not project style. + if (heap-size + 1 == heap-space) + Assert(binary heap is full); This doesn't really make any sense. Assert(binary heap is full) will never fail, because that expression is a character string which is, therefore, not NULL. I think you want Assert(heap-size heap-space). Or you could use (heap-size + 1 = heap-space) elog(LEVEL, message) if there's some reason this needs to be checked in non-debug builds. + { + sift_down(heap, i); + } Project style is to omit braces for a single-line body. This comes up a few other places as well. Other than the coding style issues, I think this looks fine. If you can fix that up, I believe I could go ahead and commit this, unless anyone else objects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
On 15.11.2012 16:50, Alvaro Herrera wrote: Heikki Linnakangas wrote: On 15.11.2012 03:17, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - separating reader/writer I'm disappointed to see that there has been no progress on this patch since last commitfest. I thought we agreed on the approach I championed for here: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00636.php. There wasn't much work left to finish that, I believe. Are you going to continue working on this? I worked a bit more on that patch of yours, but I neglected to submit it. Did you have something in particular that you wanted changed in it? Off the top of my head, there were a two open items with the patch as I submitted it: 1. Need to make sure it's easy to compile outside backend code. So that it's suitable for using in an xlogdump contrib module, for example. 2. do something about error reporting. In particular, xlogreader.c should not call emode_for_corrupt_record(), but we need to provide for that functionlity somehow. I think I'd prefer xlogreader.c to not ereport() on a corrupt record. Instead, it would return an error string to the caller, which could then decide what to do with it. Translating the messages needs some thought, though. Other than those, and cleanup of any obsoleted comments etc. and adding docs, I think it was good to go. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com wrote: In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario’s mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. sure. I'd like to note though that hint bit i/o is a somewhat common complaint. it tends to most affect OLAP style workloads. in pathological workloads, it can really burn you -- it's not fun when you are i/o starved via sequential scan. This can still happen when sweeping dead records (which this patch doesn't deal with, though maybe it should). 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Technically it does not save clog fetch as transam.c has a very similar cache mechanism. However, it does save a page write i/o and a lock on the page header, as well as a couple of other minor things. In the best case, the page write is completely masked as the page gets dirty for other reasons. I think this is going to become more important in checksum enabled scenarios. Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. VACUUM has nice features like i/o throttling and in place cancel so latent management of internal page optimization flags really belong there ideally. Also, the longer you defer such I/O the more opportunity there is for it to be masked off by some other page dirtying operation (again, this is more important in the face of having to log hint bit changes). There could be some good rebuttal analysis though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On 11/15/2012 10:11 AM, Robert Haas wrote: + { + sift_down(heap, i); + } Project style is to omit braces for a single-line body. This comes up a few other places as well. I thought we modified that some years ago, although my memory of it is a bit hazy. Personally I think it's a bad rule, at least as stated, in the case where there's an else clause with a compound statement. I'd prefer to see a rule that says that either both branches of an if/else should be compound statements or neither should be. I think the cases where only one branch is a compound statement are rather ugly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On Tue, Oct 16, 2012 at 4:47 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: The same basic strategy for sizing the tuplesort memtuples array in also exists in tuplestore. I wonder if we should repeat this there? I suppose that that could follow later. I think it'd be a good idea to either adjust tuplestore as well or explain in the comments there why the algorithm is different, because it has a comment which references this comment, and now the logic won't be in sync in spite of a comment implying that it is. I didn't initially understand why the proposed heuristic proposed is safe. It's clear that we have to keep LACKMEM() from becoming true, or we'll bail out of this code with elog(ERROR). It will become true if we increase the size of the array more by a number of elements which exceeds state-availmem / sizeof(SortTuple), but the actual number of elements being added is memtupsize * allowedMem / memNowUsed - memtupsize, which doesn't look closely related. However, I think I figured it out. We know that memNowUsed = memtupsize * sizeof(SortTuple); substituting, we're adding no more than memtupsize * allowedMem / (memtupsize * sizeof(SortTuple)) - memtupsize elements, which simplifies to allowedMem / sizeof(SortTuple) - memtupsize = (allowedMem - sizeof(SortTuple) * memtupsize) / sizeof(SortTuple). Since we know availMem allowedMem - sizeof(SortTuple) * memtupsize, that means the number of new elements is less than state-availMem/sizeof(SortTuple). Whew. In the attached version, I updated the comment to reflect this, and also reversed the order of the if/else block to put the shorter and more common case first, which I think is better style. I'm still not too sure why this part is OK: /* This may not be our first time through */ if (newmemtupsize = memtupsize) return false; Suppose we enlarge the memtuples array by something other than a multiple of 2, and then we kick out all of the tuples that are currently in memory and load a new group with a smaller average size. ISTM that it could now appear that the memtuples array can be useful further enlarged, perhaps by as little as one tuple, and that that this could happen repeatedly. I think we should either refuse to grow the array unless we can increase it by at least, say, 10%, or else set a flag after the final enlargement that acts as a hard stop to any further growth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company sortmem_grow-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Thu, Nov 15, 2012 at 10:21 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Other than the coding style issues, I think this looks fine. If you can fix that up, I believe I could go ahead and commit this, unless anyone else objects. Does this include the changes in nodeMergeappend.c? I think so. I was going to double-check the performance before committing, but it doesn't look like there should be a problem. Coding style changes appear needed in that file as well, however. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: There are two or three places in the Postgres source that implement heap sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the BDR code. It seemed reasonable to factor out the functionality. pg_dump also contains a binary heap implementation if memory serves right which makes me wonder whether we should try binaryheap.[ch] backend clean... It currently uses palloc/pfree. Too bad we don't have a memory allocation routine which easily works in the backend and frontend... palloc references MemoryContextAlloc and CurrentMemoryContext so thats not that easy to emulate. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
Heikki Linnakangas wrote: On 15.11.2012 16:50, Alvaro Herrera wrote: I worked a bit more on that patch of yours, but I neglected to submit it. Did you have something in particular that you wanted changed in it? Off the top of my head, there were a two open items with the patch as I submitted it: 1. Need to make sure it's easy to compile outside backend code. So that it's suitable for using in an xlogdump contrib module, for example. 2. do something about error reporting. In particular, xlogreader.c should not call emode_for_corrupt_record(), but we need to provide for that functionlity somehow. I think I'd prefer xlogreader.c to not ereport() on a corrupt record. Instead, it would return an error string to the caller, which could then decide what to do with it. Translating the messages needs some thought, though. Other than those, and cleanup of any obsoleted comments etc. and adding docs, I think it was good to go. Thanks. I was toying with the idea that xlogreader.c should return a status code to the caller, and additionally an error string; not all error cases are equal. Most of what I did (other than general cleanup) was moving some xlog.c global vars into a private_data struct for xlogreader.c to pass around; one problem I had was deciding what to do with curFileTLI and LastFileTLI (IIRC), because they are used outside of the reader module (they were examined after recovery finished). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote: On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com wrote: In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario's mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. sure. I'd like to note though that hint bit i/o is a somewhat common complaint. it tends to most affect OLAP style workloads. in pathological workloads, it can really burn you -- it's not fun when you are i/o starved via sequential scan. This can still happen when sweeping dead records (which this patch doesn't deal with, though maybe it should). 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Technically it does not save clog fetch as transam.c has a very similar cache mechanism. However, it does save a page write i/o and a lock on the page header, as well as a couple of other minor things. In the best case, the page write is completely masked as the page gets dirty for other reasons. I think this is going to become more important in checksum enabled scenarios. Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. Yes, in regard of deferring you are right. However in this case may be when foreground process has to mark page dirty due to hint-bit, it was already dirty so no extra I/O, but when it is done by VACUUM, page may not be dirty. Also due to below points, doing it in VACUUM may cost more: a. VACUUM has ring-buffer of fixed size and if such pages are many then write of page needs to be done by VACUUM to replace existing page in ring. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. Why we can't avoid setting hint-bit in VACUUM? Is it due to reason that it has to be done in some way, so that hint-bits are properly set. Or may be I am missing something trivial? Though the case VACUUM, I am talking occurs very less in practical, but the point came to my mind, so I thought of sharing with you. VACUUM has nice features like i/o throttling and in place cancel so latent management of internal page optimization flags really belong there ideally. Also, the longer you defer such I/O the more opportunity there is for it to be masked off by some other page dirtying operation (again, this is more important in the face of having to log hint bit changes). There could be some good rebuttal analysis though. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
Andrew Dunstan escribió: On 11/15/2012 10:11 AM, Robert Haas wrote: +{ +sift_down(heap, i); +} Project style is to omit braces for a single-line body. This comes up a few other places as well. I thought we modified that some years ago, although my memory of it is a bit hazy. No, we only modified pg_indent to not take the braces off, because of PG_TRY blocks. But we keep using single statements instead of compound in many places; but there is no hard rule about it. To me, using braces were they are not needed is pointless and ugly. We're very careful about ensuring that our macro definitions work nicely with single-statement if/else, for example. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On 11/14/12 8:17 PM, Andres Freund wrote: diff --git a/src/bin/Makefile b/src/bin/Makefile index b4dfdba..9992f7a 100644 --- a/src/bin/Makefile +++ b/src/bin/Makefile @@ -14,7 +14,7 @@ top_builddir = ../.. include $(top_builddir)/src/Makefile.global SUBDIRS = initdb pg_ctl pg_dump \ - psql scripts pg_config pg_controldata pg_resetxlog pg_basebackup + psql scripts pg_config pg_controldata pg_resetxlog pg_basebackup xlogdump should be pg_xlogdump ifeq ($(PORTNAME), win32) SUBDIRS += pgevent diff --git a/src/bin/xlogdump/Makefile b/src/bin/xlogdump/Makefile new file mode 100644 index 000..d54640a --- /dev/null +++ b/src/bin/xlogdump/Makefile @@ -0,0 +1,25 @@ +#- +# +# Makefile for src/bin/xlogdump +# +# Copyright (c) 1998-2012, PostgreSQL Global Development Group +# +# src/bin/pg_resetxlog/Makefile fix that +# +#- + +PGFILEDESC = xlogdump +PGAPPICON=win32 + +subdir = src/bin/xlogdump +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +OBJS= xlogdump.o \ + $(WIN32RES) + +all: xlogdump + + +xlogdump: $(OBJS) $(shell find ../../backend ../../timezone -name objfiles.txt|xargs cat|tr -s \012|grep -v /main.o|sed 's/^/..\/..\/..\//') + $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) This looks pretty evil, and there is no documentation about what it is supposed to do. Windows build support needs some thought. diff --git a/src/bin/xlogdump/xlogdump.c b/src/bin/xlogdump/xlogdump.c new file mode 100644 index 000..0f984e4 --- /dev/null +++ b/src/bin/xlogdump/xlogdump.c @@ -0,0 +1,468 @@ +#include postgres.h + +#include unistd.h + +#include access/xlogreader.h +#include access/rmgr.h +#include miscadmin.h +#include storage/ipc.h +#include utils/memutils.h +#include utils/guc.h + +#include getopt_long.h + +/* + * needs to be declared because otherwise its defined in main.c which we cannot + * link from here. + */ +const char *progname = xlogdump; Which may be a reason not to link with main.o. We generally don't want to hardcode the program name inside the program. +static void +usage(void) +{ + printf(_(%s reads/writes postgres transaction logs for debugging.\n\n), +progname); + printf(_(Usage:\n)); + printf(_( %s [OPTION]...\n), progname); + printf(_(\nOptions:\n)); + printf(_( -v, --version output version information, then exit\n)); + printf(_( -h, --help show this help, then exit\n)); + printf(_( -s, --startfrom where recptr onwards to read\n)); + printf(_( -e, --end up to which recptr to read\n)); + printf(_( -t, --timeline which timeline do we want to read\n)); + printf(_( -i, --inpath from where do we want to read? cwd/pg_xlog is the default\n)); + printf(_( -o, --output where to write [start, end]\n)); + printf(_( -f, --file wal file to parse\n)); +} Options list should be in alphabetic order (or some other less random order). Most of these descriptions are not very intelligible (at least without additional documentation). + +int main(int argc, char **argv) +{ + uint32 xlogid; + uint32 xrecoff; + XLogReaderState *xlogreader_state; + XLogDumpPrivateData private; + XLogRecPtr from = InvalidXLogRecPtr; + XLogRecPtr to = InvalidXLogRecPtr; + bool bad_argument = false; + + static struct option long_options[] = { + {help, no_argument, NULL, 'h'}, + {version, no_argument, NULL, 'v'}, Standard letters for help and version are ? and V. + {start, required_argument, NULL, 's'}, + {end, required_argument, NULL, 'e'}, + {timeline, required_argument, NULL, 't'}, + {inpath, required_argument, NULL, 'i'}, + {outpath, required_argument, NULL, 'o'}, + {file, required_argument, NULL, 'f'}, + {NULL, 0, NULL, 0} + }; + int c; + int option_index; + + memset(private, 0, sizeof(XLogDumpPrivateData)); + + while ((c = getopt_long(argc, argv, hvs:e:t:i:o:f:, This could also be in a less random order. + long_options, option_index)) != -1) + { + switch (c) + { + case 'h': + usage(); + exit(0); + break; + case 'v': + printf(Version: 0.1\n); + exit(0); + break; This should be the PostgreSQL version. also: no man page no nls.mk --
Re: [HACKERS] [PATCH] binary heap implementation
On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: There are two or three places in the Postgres source that implement heap sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the BDR code. It seemed reasonable to factor out the functionality. pg_dump also contains a binary heap implementation if memory serves right which makes me wonder whether we should try binaryheap.[ch] backend clean... It currently uses palloc/pfree. Too bad we don't have a memory allocation routine which easily works in the backend and frontend... palloc references MemoryContextAlloc and CurrentMemoryContext so thats not that easy to emulate. I think there's a clear need for a library of code that can be shared by frontend and backend code. I don't necessarily want to punt everything into libpq, though. What I wonder if we might do is create something like src/lib/pgguts that contains routines for memory allocation, error handling, stringinfos (so that you don't have to use PQexpBuffer in the frontend and StringInfo in the backend), etc. and make that usable by both front-end and back-end code. I think that would be a tremendously nice thing to have. I don't think we should make it this patch's problem to do that, but I'd be strongly in favor of such a thing if someone wants to put it together. It's been a huge nuisance for me in the past and all indicates are that the work you're doing on LR is just going to exacerbate that problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On 2012-11-15 11:31:55 -0500, Peter Eisentraut wrote: On 11/14/12 8:17 PM, Andres Freund wrote: diff --git a/src/bin/Makefile b/src/bin/Makefile index b4dfdba..9992f7a 100644 --- a/src/bin/Makefile +++ b/src/bin/Makefile @@ -14,7 +14,7 @@ top_builddir = ../.. include $(top_builddir)/src/Makefile.global SUBDIRS = initdb pg_ctl pg_dump \ - psql scripts pg_config pg_controldata pg_resetxlog pg_basebackup + psql scripts pg_config pg_controldata pg_resetxlog pg_basebackup xlogdump should be pg_xlogdump Good Point. ifeq ($(PORTNAME), win32) SUBDIRS += pgevent diff --git a/src/bin/xlogdump/Makefile b/src/bin/xlogdump/Makefile new file mode 100644 index 000..d54640a --- /dev/null +++ b/src/bin/xlogdump/Makefile @@ -0,0 +1,25 @@ +#- +# +# Makefile for src/bin/xlogdump +# +# Copyright (c) 1998-2012, PostgreSQL Global Development Group +# +# src/bin/pg_resetxlog/Makefile fix that Dito. +# +#- + +PGFILEDESC = xlogdump +PGAPPICON=win32 + +subdir = src/bin/xlogdump +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +OBJS= xlogdump.o \ +$(WIN32RES) + +all: xlogdump + + +xlogdump: $(OBJS) $(shell find ../../backend ../../timezone -name objfiles.txt|xargs cat|tr -s \012|grep -v /main.o|sed 's/^/..\/..\/..\//') + $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) This looks pretty evil, and there is no documentation about what it is supposed to do. There has been some talk about this before and this clearly isn't an acceptable solution. The previously stated idea was to split of the _desc routines so we don't need to link with the whole backend. Alvaro stared to work on that a bit: http://archives.postgresql.org/message-id/1346268803-sup-9854%40alvh.no-ip.org (What the above does is simply collect all backend object files, remove main.o from that list an dlist them as dependencies.) Windows build support needs some thought. I don't have the slightest clue how the windows build environment works, is there still a problem if we only link to a very selected list of backend object files? Or do we need to link them to some external location? diff --git a/src/bin/xlogdump/xlogdump.c b/src/bin/xlogdump/xlogdump.c new file mode 100644 index 000..0f984e4 --- /dev/null +++ b/src/bin/xlogdump/xlogdump.c @@ -0,0 +1,468 @@ +#include postgres.h + +#include unistd.h + +#include access/xlogreader.h +#include access/rmgr.h +#include miscadmin.h +#include storage/ipc.h +#include utils/memutils.h +#include utils/guc.h + +#include getopt_long.h + +/* + * needs to be declared because otherwise its defined in main.c which we cannot + * link from here. + */ +const char *progname = xlogdump; Which may be a reason not to link with main.o. Well, we're not linking to main.o which causes the problem, but yes, really fixing this is definitely the goal, but not really possible yet. +static void +usage(void) +{ + printf(_(%s reads/writes postgres transaction logs for debugging.\n\n), + progname); + printf(_(Usage:\n)); + printf(_( %s [OPTION]...\n), progname); + printf(_(\nOptions:\n)); + printf(_( -v, --version output version information, then exit\n)); + printf(_( -h, --help show this help, then exit\n)); + printf(_( -s, --startfrom where recptr onwards to read\n)); + printf(_( -e, --end up to which recptr to read\n)); + printf(_( -t, --timeline which timeline do we want to read\n)); + printf(_( -i, --inpath from where do we want to read? cwd/pg_xlog is the default\n)); + printf(_( -o, --output where to write [start, end]\n)); + printf(_( -f, --file wal file to parse\n)); +} Options list should be in alphabetic order (or some other less random order). Most of these descriptions are not very intelligible (at least without additional documentation). True, its noticeable that this mostly was a development tool. But it shouldn't stay that way. There have been several bugreports of late where a bin/pg_xlogdump would have been very helpful... This should be the PostgreSQL version. also: no man page no nls.mk Will try to provide some actually submittable version once the xlogreader situation is finalized and the _desc routines are splitted... Thanks! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote: IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. Yes, in regard of deferring you are right. However in this case may be when foreground process has to mark page dirty due to hint-bit, it was already dirty so no extra I/O, but when it is done by VACUUM, page may not be dirty. Yeah. We can try to be smart and set the hint bits in that case. Not sure that will work out with checksum having to wal log hint bits though (which by reading the checksum threads seems likely to happen). Also due to below points, doing it in VACUUM may cost more: a. VACUUM has ring-buffer of fixed size and if such pages are many then write of page needs to be done by VACUUM to replace existing page in ring. Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data-t_infomask HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. All other *setting* of hint bits is running through SetHintBits(), so I think we are safe from vacuum point of view. That's another thing to test for though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On 2012-11-15 11:37:16 -0500, Robert Haas wrote: On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: There are two or three places in the Postgres source that implement heap sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the BDR code. It seemed reasonable to factor out the functionality. pg_dump also contains a binary heap implementation if memory serves right which makes me wonder whether we should try binaryheap.[ch] backend clean... It currently uses palloc/pfree. Too bad we don't have a memory allocation routine which easily works in the backend and frontend... palloc references MemoryContextAlloc and CurrentMemoryContext so thats not that easy to emulate. I think there's a clear need for a library of code that can be shared by frontend and backend code. I don't necessarily want to punt everything into libpq, though. Aggreed. What I wonder if we might do is create something like src/lib/pgguts that contains routines for memory allocation, error handling, stringinfos (so that you don't have to use PQexpBuffer in the frontend and StringInfo in the backend), etc. and make that usable by both front-end and back-end code. I think that would be a tremendously nice thing to have. Yes. But it also sounds like quite a bit of work :( I don't think we should make it this patch's problem to do that Me neither. I was thinking about letting the user allocate enough memory like: binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10)); binaryheap_init(heap, 10, comparator); But thats pretty ugly. but I'd be strongly in favor of such a thing if someone wants to put it together. It's been a huge nuisance for me in the past and all indicates are that the work you're doing on LR is just going to exacerbate that problem. I actually hope I won't have to fight with that all that much, but we will see :/ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com wrote: --- src/bin/Makefile| 2 +- src/bin/xlogdump/Makefile | 25 +++ src/bin/xlogdump/xlogdump.c | 468 3 files changed, 494 insertions(+), 1 deletion(-) create mode 100644 src/bin/xlogdump/Makefile create mode 100644 src/bin/xlogdump/xlogdump.c Is this intended to be the successor of https://github.com/snaga/xlogdump which will then be deprecated? Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Thu, Nov 15, 2012 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: Me neither. I was thinking about letting the user allocate enough memory like: binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10)); binaryheap_init(heap, 10, comparator); But thats pretty ugly. Yeah. I would vote against doing that for now. We can always uglify the code later if we decide it's absolutely necessary. One trick that we could potentially use to make code run in the frontend and backend is to put it in src/port. That code gets compiled twice, once within -DFRONTEND and once without. So you can use #ifdef to handle things that need to work different ways. The only real problem with that approach is that you end up putting the code in libpgport where it seems rather out of place. Still, maybe we could have a src/framework directory that uses the same trick to produce a libpgframework that frontend code can use, and a lib pgframework_srv that can be linked into the backend. That's might not actually be that much work; we'd just need to clone all the existing src/port logic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On 2012-11-15 09:06:23 -0800, Jeff Janes wrote: On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com wrote: --- src/bin/Makefile| 2 +- src/bin/xlogdump/Makefile | 25 +++ src/bin/xlogdump/xlogdump.c | 468 3 files changed, 494 insertions(+), 1 deletion(-) create mode 100644 src/bin/xlogdump/Makefile create mode 100644 src/bin/xlogdump/xlogdump.c Is this intended to be the successor of https://github.com/snaga/xlogdump which will then be deprecated? As-is this is just a development tool which was sorely needed for the development of this patchset. But yes I think that once ready (xlogreader infrastructure, *_desc routines splitted) it should definitely be able to do most of what the above xlogdump can do and it should live in bin/. I think mostly some filtering is missing. That doesn't really deprecate the above though. Does that answer your question? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 15 November 2012 16:09, Robert Haas robertmh...@gmail.com wrote: [Lots of complicated commentary on tuplesort variables] Whew. In the attached version, I updated the comment to reflect this, and also reversed the order of the if/else block to put the shorter and more common case first, which I think is better style. Fine by me. In conversation with Greg Stark in Prague, he seemed to think that there was an integer overflow hazard in v3, which is, in terms of the machine code it will produce, identical to your revision. He pointed this out to me when we were moving between sessions, and I only briefly looked over his shoulder, so while I don't want to misrepresent what he said, I *think* he said that this could overflow: + newmemtupsize = memtupsize * allowedMem / memNowUsed; Having only looked at this properly now, I don't think that that's actually the case, but I thought that it should be noted. I'm sorry if it's unfair to Greg to correct him, even though that's something he didn't formally put on the record, and may have only looked at briefly. I can see why it might have looked like a concern, since we're assigning the result of an arithmetic expression involving long variables to an int, newmemtupsize. The fact of the matter is that we're already asserting that: + Assert(newmemtupsize = state-memtupsize * 2); Previously, we'd just have doubled memtupsize anyway. So any eventuality in which newmemtupsize overflows ought to be logically impossible. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 15 November 2012 16:09, Robert Haas robertmh...@gmail.com wrote: I'm still not too sure why this part is OK: /* This may not be our first time through */ if (newmemtupsize = memtupsize) return false; Suppose we enlarge the memtuples array by something other than a multiple of 2, and then we kick out all of the tuples that are currently in memory and load a new group with a smaller average size. ISTM that it could now appear that the memtuples array can be useful further enlarged, perhaps by as little as one tuple, and that that this could happen repeatedly. I think we should either refuse to grow the array unless we can increase it by at least, say, 10%, or else set a flag after the final enlargement that acts as a hard stop to any further growth. I thought that the flag added to Tuplesortstate in earlier revisions was a little bit ugly. I can see the concern though, and I suppose that given the alternative is to add a heuristic, simply using a flag may be the best way to proceed. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] another idea for changing global configuration settings from SQL
Independent of the discussion of how to edit configuration files from SQL, I had another idea how many of the use cases for this could be handled. We already have the ability to store in pg_db_role_setting configuration settings for specific user, specific database specific user, any database any user, specific database The existing infrastructure would also support any user, any database (= all the time) All you'd need is to add ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_$SOMETHING); in postinit.c, and have some SQL command to modify this setting. The only thing you couldn't handle that way are SIGHUP settings, but the often-cited use cases work_mem, logging, etc. would work. There would also be the advantage that pg_dumpall would save these settings. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
Le jeudi 15 novembre 2012 15:48:14, Amit kapila a écrit : On Wednesday, November 14, 2012 12:24 AM Robert Haas wrote: On Mon, Nov 12, 2012 at 10:59 PM, Amit kapila amit.kap...@huawei.com wrote: Uh, no, I don't think that's a good idea. IMHO, what we should do is: 1. Read postgresql.conf.auto and remember all the settings we saw. If we see something funky like an include directive, barf. 2. Forget the value we remembered for the particular setting being changed. Instead, remember the user-supplied new value for that parameter. 3. Write a new postgresql.conf.auto based on the information remembered in steps 1 and 2. Attached patch contains implementaion for above concept. It can be changed to adapt the write file based on GUC variables as described by me yesterday or in some other better way. Currenty to remember and forget, I have used below concept: 1. Read postgresql.auto.conf in memory. 2. parse the GUC_file for exact loction of changed variable 3. update the changed variable in memory at location found in step-2 4. Write the postgresql.auto.conf Overall changes: 1. include dir in postgresql.conf at time of initdb 2. new built-in function pg_change_conf to change configuration settings 3. write file as per logic described above. Some more things left are: 1. Transactional capability to command, so that rename of .lock file to .auto.conf can be done at commit time. I am planing to upload the attached for this CF. Suggestions/Comments? Yes, sorry to jump here so late. * Why don't we use pg_settings ? (i.e. why not materialize the view and use it, it should be easier to have something transactional and also serializable with probably a DEFERRABLE select pg_reload_config() which mv the configuration file at commit time) * Can I define automatic parameters to be loaded before and/or after the non- automatic parameters in a convenient way (without editing files at all)? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] another idea for changing global configuration settings from SQL
On Thu, Nov 15, 2012 at 12:53 PM, Peter Eisentraut pete...@gmx.net wrote: Independent of the discussion of how to edit configuration files from SQL, I had another idea how many of the use cases for this could be handled. We already have the ability to store in pg_db_role_setting configuration settings for specific user, specific database specific user, any database any user, specific database The existing infrastructure would also support any user, any database (= all the time) All you'd need is to add ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_$SOMETHING); in postinit.c, and have some SQL command to modify this setting. The only thing you couldn't handle that way are SIGHUP settings, but the often-cited use cases work_mem, logging, etc. would work. There would also be the advantage that pg_dumpall would save these settings. Thoughts? Personally, I think that would be wonderful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] another idea for changing global configuration settings from SQL
Le jeudi 15 novembre 2012 18:53:15, Peter Eisentraut a écrit : Independent of the discussion of how to edit configuration files from SQL, I had another idea how many of the use cases for this could be handled. We already have the ability to store in pg_db_role_setting configuration settings for specific user, specific database specific user, any database any user, specific database The existing infrastructure would also support any user, any database (= all the time) All you'd need is to add ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_$SOMETHING); in postinit.c, and have some SQL command to modify this setting. The only thing you couldn't handle that way are SIGHUP settings, but the often-cited use cases work_mem, logging, etc. would work. There would also be the advantage that pg_dumpall would save these settings. Thoughts? I like the idea. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On Thu, Nov 15, 2012 at 12:14 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 15 November 2012 16:09, Robert Haas robertmh...@gmail.com wrote: [Lots of complicated commentary on tuplesort variables] Whew. In the attached version, I updated the comment to reflect this, and also reversed the order of the if/else block to put the shorter and more common case first, which I think is better style. Fine by me. In conversation with Greg Stark in Prague, he seemed to think that there was an integer overflow hazard in v3, which is, in terms of the machine code it will produce, identical to your revision. He pointed this out to me when we were moving between sessions, and I only briefly looked over his shoulder, so while I don't want to misrepresent what he said, I *think* he said that this could overflow: + newmemtupsize = memtupsize * allowedMem / memNowUsed; Ah, yeah. I wondered in passing about that but forgot to follow up on it. The problem specifically is that the intermediate result memtupsize * newmemtuples might overflow. I believe that the old memtupsize can never be more than 2^26 bytes, because the allocation limit is 1GB and each SortTuple is 16 bytes. So if this is done as a 32-bit calculation, we'll potentially overflow if allowedMem is more than 64 bytes i.e. almost for sure. If it is done as a 64-byte calculation we'll potentially overflow if allowedMem is more than 2^38 bytes = 256GB. The actual declared type is long which I assume is probably 64 bits at least for most people these days, but maybe not for everyone, and even 256GB is not necessarily safe. The highest value for work_mem I can set here is 2047GB. So I think there is an issue here, unless there's something wrong with my analysis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_ctl reload -o ....
If I want to change a parameter that affects an auxiliary process (like bgwriter), I can usually get away with doing; pg_ctl restart -o '--setting=new' But sometimes I really need to avoid the restart, because it blows away shared_buffers or for other reasons. I can do pg_ctl reload, but that ignores the -o option (without notice). So to go the reload route, I have to edit postgresql.conf before doing the reload. This is quite tedious and error-prone, especially in a script. Is there a reason pg_ctl reload shouldn't honor -o ? Is there reasonable avenue to get it do so? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timing events WIP v1
Extending this to save the key/value set and most of the other data I mentioned before is pretty straightforward. Why not use Hstore? Seriously? It would require merging Hstore into core, but I don't necessarily see that as a bad thing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin, Attached is a patch that is still WIP but that I think is getting pretty close to completion. It is not intended to be the be-all and end-all for materialized views, but the minimum useful feature set -- which is all that I've had time to do for this release. In particular, the view is only updated on demand by a complete rebuild. For the next release, I hope to build on this base to allow more eager and incremental updates, and perhaps a concurrent batch update. Nice to see this come in! 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE TABLE AS, with all the same clauses supported. That includes declaring a materialized view to be temporary or unlogged. What use would a temporary matview be? Unlogged is good. 2. MVs don't support inheritance. In which direction? Can't inherit, or can't be inherited from? 3. MVs can't define foreign keys. 4. MVs can't be the target of foreign keys. 5. MVs can't have triggers. Makes sense. 9. MVs can't directly be used in a COPY statement, but can be the source of data using a SELECT. Hmmm? I don't understand the reason for this. 13. pg_class now has a relisvalid column, which is true if an MV is truncated or created WITH NO DATA. You can not scan a relation flagged as invalid. What error would a user see? 14. ALTER MATERIALIZED VIEW is supported for the options that seemed to make sense. For example, you can change the tablespace or schema, but you cannot add or drop column with ALTER. How would you change the definition of an MV then? 16. To get new data into the MV, the command is LOAD MATERIALIZED VIEW mat view_name. This seemed more descriptive to me that the alternatives and avoids declaring any new keywords beyond MATERIALIZED. If the MV is flagged as relisvalid == false, this will change it to true. UPDATE MATERIALIZED VIEW was problematic? Does LOAD automatically TRUNCATE the view before reloading it? If not, why not? It would be good to have some discussion to try to reach a consensus about whether we need to differentiate between *missing* datat (where a materialized view which has been loaded WITH NO DATA or TRUNCATEd and has not been subsequently LOADed) and potentially *stale* data. If we don't care to distinguish between a view which generated no rows when it ran and a one for which the query has not been run, we can avoid adding the relisvalid flag, and we could support UNLOGGED MVs. Perhaps someone can come up with a better solution to that problem. Hmmm. I understand the distinction you're making here, but I'm not sure it actually matters to the user. MVs, by their nature, always have potentially stale data. Being empty (in an inaccurate way) is just one kind of stale data. It would be nice for the user to have some way to know that a matview is empty due to never being LOADed or recently being TRUNCATEd. However, I don't think that relisvalid flag -- and preventing scanning the relation -- is a good solution. What I'd rather have instead is a timestamp of when the MV was last LOADed. If the MV was never loaded (or was truncated) that timestamp would be NULL. Such a timestamp would allow users to construct all kinds of ad-hoc refresh schemes for MVs which would not be possible without it. I don't see how this relates to UNLOGGED matviews either way. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] another idea for changing global configuration settings from SQL
ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_$SOMETHING); in postinit.c, and have some SQL command to modify this setting. The only thing you couldn't handle that way are SIGHUP settings, but the often-cited use cases work_mem, logging, etc. would work. There would also be the advantage that pg_dumpall would save these settings. I think this is a great idea. One caveat: we really, really, really need a system view which allows DBAs to easily review settings defined for specific users and databases. Right now, it requires significant pg_catalog hacking expertise to pull out user-specific settings. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY
On Wed, Nov 14, 2012 at 10:14 PM, Craig Ringer cr...@2ndquadrant.com wrote: So? You're already handing the keys to the kingdom to anybody who can control the contents of that command line, even if it's only to point at the wrong program. And one man's unexpected side-effect is another man's essential feature, as in my examples above. That's true if they're controlling the whole command, not so much if they just provide a file name. I'm just worried that people will use it without thinking deeply about the consequences, just like they do with untrusted client input in SQL injection attacks. Yeah. If we're going to do this at all, and I'm not convinced it's worth the work, I think it's definitely good to support a variant where we specify exactly the things that will be passed to exec(). There's just too many ways to accidentally shoot yourself in the foot otherwise. If we want to have an option that lets people shoot themselves in the foot, that's fine. But I think we'd be smart not to make that the only option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 15 November 2012 18:13, Robert Haas robertmh...@gmail.com wrote: Ah, yeah. I wondered in passing about that but forgot to follow up on it. The problem specifically is that the intermediate result memtupsize * newmemtuples might overflow. I believe that the old memtupsize can never be more than 2^26 bytes, because the allocation limit is 1GB and each SortTuple is 16 bytes. Do you mean the intermediate result of memtupsize * allowedMem? Oh, yeah, that could overflow rather easily on a platform where long is only 32-bit. We're multiplying the entire current allocation size of the array by the maximum length. I guess the fact that you didn't spot it made me overconfident. :-) -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On Thu, Nov 15, 2012 at 2:13 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 15 November 2012 18:13, Robert Haas robertmh...@gmail.com wrote: Ah, yeah. I wondered in passing about that but forgot to follow up on it. The problem specifically is that the intermediate result memtupsize * newmemtuples might overflow. I believe that the old memtupsize can never be more than 2^26 bytes, because the allocation limit is 1GB and each SortTuple is 16 bytes. Do you mean the intermediate result of memtupsize * allowedMem? Yes. Oh, yeah, that could overflow rather easily on a platform where long is only 32-bit. Or even 64-bit, with really large values of work_mem. We're multiplying the entire current allocation size of the array by the maximum length. I guess the fact that you didn't spot it made me overconfident. :-) Ha! So what's next here? Do you want to work on these issue some more? Or does Jeff? I'd like to see this go in, but I'm not sure I have the bandwidth to do the legwork myself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl reload -o ....
Jeff Janes jeff.ja...@gmail.com writes: Is there a reason pg_ctl reload shouldn't honor -o ? -o means pass these switches on the postmaster's command line. If you're not restarting the postmaster, you don't get to change its command line. IMO setting stuff on the command line is pretty evil anyway. Adjusting postgresql.conf is almost always the Better Way. We have discussions going already about how to make that easier, so I'm not excited about trying to kluge something in pg_ctl. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY
Robert Haas robertmh...@gmail.com writes: Yeah. If we're going to do this at all, and I'm not convinced it's worth the work, I think it's definitely good to support a variant where we specify exactly the things that will be passed to exec(). There's just too many ways to accidentally shoot yourself in the foot otherwise. If we want to have an option that lets people shoot themselves in the foot, that's fine. But I think we'd be smart not to make that the only option. [ shrug... ] Once again, that will turn this from a ten-line patch into hundreds of lines (and some more, different, hundreds of lines for Windows I bet), with a corresponding growth in the opportunities for bugs, for a benefit that's at best debatable. The biggest problem this patch has had from the very beginning is overdesign, and this is more of the same. Let's please just define the feature as popen, not fopen, the given string and have done. You can put all the warning verbiage you want in the documentation. (But note that the server-side version would be superuser-only in any flavor of the feature.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 15 November 2012 19:16, Robert Haas robertmh...@gmail.com wrote: So what's next here? Do you want to work on these issue some more? Or does Jeff? I'd like to see this go in, but I'm not sure I have the bandwidth to do the legwork myself. I'll take another look. No elegant solution immediately occurs to me, though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_trgm partial-match
Hi, I'd like to propose to extend pg_trgm so that it can compare a partial-match query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial' GIN method for pg_trgm. Currently, when the query key is less than three characters, we cannot use a GIN index (+ pg_trgm) efficiently, because pg_trgm doesn't support a partial-match method. In this case, seq scan or index full scan would be executed, and its response time would be very slow. I'd like to alleviate this problem. Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, i.e., if query key contains multibyte characters. In this case, byte length of the trigram string might be larger than three, and its CRC is used as a trigram key instead of the trigram string itself. Because of using CRC, we cannot do a partial-match. Attached patch extends pg_trgm so that it compares a partial-match query key only when KEEPONLYALNUM is enabled. Attached patch is WIP yet. What I should do next is: * version up pg_trgm from 1.0 to 1.1, i.e., create pg_trgm--1.1.sql, etc. * write the regression test Comments? Review? Objection? Regards, -- Fujii Masao trgm_compare_partial_v0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dumping an Extension's Script
On Mon, Nov 12, 2012 at 11:00 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Please find attached to this email an RFC patch implementing the basics of the pg_dump --extension-script option. After much discussion around the concept of an inline extension, we decided last year that a good first step would be pg_dump support for an extension's script. Do you have a link to the original thread? I have to confess I don't remember what the purpose of this was and, heh heh, there are no documentation changes in the patch itself either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] support for LDAP URLs
On Mon, Nov 12, 2012 at 10:38 PM, Peter Eisentraut pete...@gmx.net wrote: Here is a patch to support RFC 2255 LDAP URLs in pg_hba.conf. So, instead of, say host ... ldap ldapserver=ldap.example.net ldapbasedn=dc=example, dc=net ldapsearchattribute=uid you could write host ... ldap lapurl=ldap://ldap.example.net/dc=example,dc=net?uid?sub; Apache and probably other software uses the same format, and it's easier to have a common format for all such configuration instead of having to translate the information provided by the LDAP admin into each software's particular configuration spellings. I'm using the OpenLDAP-provided URL parsing routine, which means this wouldn't be supported on Windows. But we already support different authentication settings on different platforms, so this didn't seem such a big problem. I think this is broadly reasonable, but I'm not sure this part is a good idea: +#ifdef USE_LDAP +#ifndef WIN32 +/* We use a deprecated function to keep the codepath the same as win32. */ +#define LDAP_DEPRECATED 1 +#include ldap.h +#else +#include winldap.h +#endif +#endif Presumably if it's deprecated now, it might go away without notice, and we shouldn't be relying on it to stick around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2012-11-14 at 21:22 -0500, Robert Haas wrote: But there are some practical issues, as Tom points out. Another one is that it's harder for external utilities (like pg_basebackup) to verify checksums. Well, I think the invariant we'd need to maintain is as follows: every page for which the checksum fork might be wrong must have an FPI following the redo pointer. So, at the time we advance the redo pointer, we need the checksum fork to be up-to-date for all pages for which a WAL record was written after the old redo pointer except for those for which a WAL record has again been written after the new redo pointer. In other words, the checksum pages we write out don't need to be completely accurate; the checksums for any blocks we know will get clobbered anyway during replay don't really matter. The issue about external utilities is a bigger problem than I realized at first. Originally, I thought that it was just a matter of code to associate the checksum with the data. However, an external utility will never see a torn page while the system is online (after recovery); but it *will* see an inconsistent view of the checksum and the data if they are issued in separate write() calls. So, the hazard of storing the checksum in a different place is not equivalent to the existing hazard of a torn page. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
Craig Ringer wrote: On 11/15/2012 06:25 PM, Hannu Krosing wrote: On 11/15/2012 09:48 AM, Craig Ringer wrote: If you want to prevent TRUNCATE, deny the privilege or add a trigger that aborts the command. You can abort the transaction but not skip action as currently it is only possible to skip in ROW level triggers. So I'd modify this request to allow BEFORE EACH STATEMENT triggers to also be able to silently skip current action like BEFORE EACH ROW triggers can. Then this request would simply be satisfied by a simple trigger which rewrites TRUNCATE into DELETE . That seems sensible to me, too. To further explain ... What I'm desiring here with respect to TRUNCATE is that users are allowed to use TRUNCATE syntax as an alternative to DELETE (they are GRANTed both) but that any triggers defined for DELETE can also be applied to TRUNCATE without too much difficulty. So users can use different syntactic constructs that look similar without having to think about, is this going to be audited. I understand that ROW level triggers don't exist yet for TRUNCATE (this is already clearly documented in the manual) and adding them would mean TRUNCATE would do a table scan in their presence. I still think the syntax of TRUNCATE FOR EACH ROW would be useful, but if no one agrees, then I'll just make do with alternative solutions mentioned here. That is, either a statement-level TRUNCATE trigger of some kind, or simply disallow TRUNCATE privileges. Thank you. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On Thu, Nov 15, 2012 at 7:36 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 15 November 2012 19:16, Robert Haas robertmh...@gmail.com wrote: So what's next here? Do you want to work on these issue some more? Or does Jeff? I'd like to see this go in, but I'm not sure I have the bandwidth to do the legwork myself. I'll take another look. No elegant solution immediately occurs to me, though. The overflow was trivial to fix. The only concern I had was about the behaviour after it did the special case. I didn't want it to keep doing the math and trying to grow again a little bit every tuple. I think I was leaning to putting the magic flag back. The alternative is to only do the one-off grow if we can grow at least some arbitrary percentage like 10% or something like that. But it seems like a lot of arithmetic to be doing each time around for probably no gain. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
On Thu, Nov 15, 2012 at 2:53 PM, Darren Duncan dar...@darrenduncan.net wrote: I still think the syntax of TRUNCATE FOR EACH ROW would be useful, but if no one agrees... I'm compelled to disagree. What was useful about TRUNCATE in the first place was that it quickly operated against the entire table. If you want to change that to row-by-row processing, then it is actually a little bit worse than DELETE FROM some_table, in that it is introducing an irregularity in language that no longer provides any corresponding benefit. (e.g. - such as that TRUNCATE is fast). If you want to be certain of doing row-by-row processing, then the better answer is to: a) Use DELETE instead in your application, and b) Put a guard on to prevent using TRUNCATE. (e.g. - attach triggers that react to TRUNCATE with go away, don't bother me!) I observe that the Slony replication system initially implemented that 'guard' approach when TRUNCATE triggers were first provided, until we came up with a suitable strategy to capture and replicate the TRUNCATE request. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this?
Re: [HACKERS] Dumping an Extension's Script
Robert Haas robertmh...@gmail.com writes: Please find attached to this email an RFC patch implementing the basics of the pg_dump --extension-script option. After much discussion around the concept of an inline extension, we decided last year that a good first step would be pg_dump support for an extension's script. Do you have a link to the original thread? I have to confess I don't remember what the purpose of this was and, heh heh, there are no documentation changes in the patch itself either. My notes include those links to the original thread: http://archives.postgresql.org/message-id/3157.1327298...@sss.pgh.pa.us http://archives.postgresql.org/pgsql-hackers/2012-01/msg01311.php https://commitfest.postgresql.org/action/patch_view?id=746 I could of course work on documenting the changes prior to the reviewing, the thing is that I've been taking a different implementation route towards the pg_dump --extension-script idea we talked about, that I think is much simpler than anything else. So I'd like to know if that approach is deemed acceptable by the Guardians Of The Code before expanding any more hour on this… It basically boils down to this hunk in dumpExtension(): output CREATE EXTENSION x WITH … AS $x$ /* * Have another archive for this extension: this allows us to simply * walk the extension's dependencies and use the existing pg_dump code * to get the object create statement to be added in the script. * */ eout = CreateArchive(NULL, archNull, 0, archModeAppend); EH = (ArchiveHandle *) eout; /* grab existing connection and remote version information */ EH-connection = ((ArchiveHandle *)fout)-connection; eout-remoteVersion = fout-remoteVersion; /* dump all objects for this extension, that have been sorted out in * the right order following dependencies etc */ ... /* restore the eout Archive into the local buffer */ for (te = EH-toc-next; te != EH-toc; te = te-next) { if (strlen(te-defn) 0) appendPQExpBuffer(q, %s, te-defn); } CloseArchive(eout); output $x$; What do you think? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timing events WIP v1
Greg Smith wrote: -The event queue into a simple array accessed in circular fashion. After realizing that output needed to navigate in the opposite order of element addition, I ended up just putting all the queue navigation code directly into the add/output routines. I'd be happy to use a more formal Postgres type here instead--I looked at SHM_QUEUE for example--but I haven't found something yet that makes this any simpler. SHM_QUEUE is on the death row. Try a dlist from src/backend/lib/ilist.c -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl reload -o ....
On Thu, Nov 15, 2012 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Janes jeff.ja...@gmail.com writes: Is there a reason pg_ctl reload shouldn't honor -o ? -o means pass these switches on the postmaster's command line. If you're not restarting the postmaster, you don't get to change its command line. OK. You see pg_ctl as a wrapper around postmaster, while I was viewing it as a thing in itself. IMO setting stuff on the command line is pretty evil anyway. I wouldn't do it in a production system, but it is invaluable for performance testing during development or diagnosis. Oh well, I guess this is what perl -i -pe is for. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] User control over psql error stream
Hi, This patch gives the end user control over psql's error stream. This allows a single psql session to use \o to send both errors and table output to multiple files. Useful when capturing test output, etc. Control is provided via a new estream \pset. Here's the docs. -snip estream Controls the output stream(s) used to report error messages. Value must be one of: stderr (the default), which sends errors to the standard error stream; query, which injects error messages into the query result output stream; or both, which sends errors to both output streams. Error messages are comprised of errors from psql and notice messages and errors from the database server. -snip Against head. psql-estream.patch The patch. psql-estream_test.patch Adds a regression test to test the patch. There's a number of problems with psql-estream_test.patch, the most notable is that it probably won't work on MS Windows because it uses /dev/null to avoid touching the host filesystem. I'm not sure whether this should have a regression test and if so what the right way is to do it. Note that psql-stream.patch includes some re-writing of the docs for the psql \o option that goes slightly beyond the minimum change required to explain \pset estream's effects. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index c41593c..695739e 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1800,11 +1800,16 @@ lo_import 152801 specified, the query output will be reset to the standard output. /para -paraquoteQuery results/quote includes all tables, command -responses, and notices obtained from the database server, as -well as output of various backslash commands that query the -database (such as command\d/command), but not error -messages. +paraquoteQuery results/quote includes all tables, sql +command responses and status information, notifications from +commandLISTEN/command, and output of various backslash +commands that query the database (such as +command\d/command). quoteQuery results/quote do not +include errors from applicationpsql/applicationor notice +messages or errors from the database server unless +command\pset estream/command is used. Nor do they include +output of backslash commands which do not query the database +(such as command\h/command). /para tip @@ -1863,16 +1868,18 @@ lo_import 152801 listitem para -This command sets options affecting the output of query result tables. -replaceable class=parameteroption/replaceable -indicates which option is to be set. The semantics of -replaceable class=parametervalue/replaceable vary depending -on the selected option. For some options, omitting replaceable -class=parametervalue/replaceable causes the option to be toggled -or unset, as described under the particular option. If no such -behavior is mentioned, then omitting -replaceable class=parametervalue/replaceable just results in -the current setting being displayed. +This command sets options affecting the output of errors and +query results. There are extensive options regarding table +formatting. replaceable +class=parameteroption/replaceable indicates which option +is to be set. The semantics of replaceable +class=parametervalue/replaceable vary depending on the +selected option. For some options, omitting replaceable +class=parametervalue/replaceable causes the option to be +toggled or unset, as described under the particular option. +If no such behavior is mentioned, then omitting replaceable +class=parametervalue/replaceable just results in the +current setting being displayed. /para para @@ -1914,6 +1921,24 @@ lo_import 152801 /varlistentry varlistentry + termliteralestream/literal/term + listitem + para + Controls the output stream(s) used to report error messages. + replaceable class=parameterValue/replaceable must be + one of: literalstderr/literal (the default), which sends + errors to the standard error stream; + literalquery/literal, which injects error messages into + the query result output stream; or literalboth/literal, + which sends errors to both output streams. quoteError + messages/quote are comprised of errors from + applicationpsql/application and notice messages and + errors from the database
Re: [HACKERS] Further pg_upgrade analysis for many tables
Alvaro Herrera alvhe...@2ndquadrant.com writes: Could we use some adaptive mechanism here? Say we use a list for the first ten entries, and if an eleventh one comes in, we create a hash table for that one and all subsequent ones. All future calls would have to examine both the list for the first few and then the hash table. Is it necessary to do so? Do we know for sure that a 10 elements hash table is slower than a 10 elements list when only doing key based lookups, for the object data type we're interested into here? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl reload -o ....
On 11/15/2012 03:41 PM, Jeff Janes wrote: On Thu, Nov 15, 2012 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Janes jeff.ja...@gmail.com writes: Is there a reason pg_ctl reload shouldn't honor -o ? -o means pass these switches on the postmaster's command line. If you're not restarting the postmaster, you don't get to change its command line. OK. You see pg_ctl as a wrapper around postmaster, while I was viewing it as a thing in itself. You're possibly not aware of its history. Before 8.0 it was a shell script and its use was seen as rather optional. IIRC there was even some debate about whether or not we needed it at all for the original Windows port (luckily Bruce and I thought we did, and made it happen.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 15 November 2012 19:54, Greg Stark st...@mit.edu wrote: The only concern I had was about the behaviour after it did the special case. I didn't want it to keep doing the math and trying to grow again a little bit every tuple. I think I was leaning to putting the magic flag back. The alternative might just be to add a new constant to the TupSortStatus enum. That might be more logical. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Partial-index predicate proofs got dumber in 9.2
I looked into the optimization regression reported here: http://archives.postgresql.org/pgsql-performance/2012-11/msg00140.php It's easy to reproduce the problem in the regression database: create index ti on tenk1 (fivethous) where fivethous is not null; explain select * from int4_tbl, tenk1 where f1 = fivethous; 9.1 and earlier will produce a nestloop-with-inner-indexscan, but 9.2 fails to (and ends up with a much-more-expensive hash join) because it doesn't think the partial index ti has been proven usable for the query. However, because the = operator is strict, no row in which fivethous is null could be relevant to the query result, so we should be able to use the partial index. The older code succeeds at this because when it is considering whether it can use an index for an inner indexscan, it will apply all the join clauses found by find_clauses_for_join() to the task of proving the index predicate true. (It's notable that this includes all join clauses mentioning the target relation, not only those that can be used with the index's columns; so we'll be able to prove things true even if the index predicate involves table columns that aren't in the index proper.) In 9.2, which has been rewritten to generate inner indexscans bottom up as parameterized paths, we simply skip generating parameterized paths for any index that's not marked predOK, and the predOK test is made using only the restriction clauses for that relation, not join clauses. Ooops. We could try to replicate the way the previous code did it, but it'd probably add significant expense. What I'm thinking at the moment is that check_partial_indexes() has been missing a bet all along, because there is no reason for it to confine its attention to restriction clauses. We should just have it include join clauses for the rel in the restriction list passed to predicate_implied_by(). That is, a join clause can be assumed true for the purposes of deciding whether an index is usable *whether or not we ever actually use that join clause with the index*. This claim is shaky in the presence of outer joins, though. Consider select ... from t1 left join t2 on (t1.a = t2.b) We can use a partial index on t2 that requires t2.b IS NOT NULL, since no row where b is null will affect the result. A more interesting case is select ... from t1 left join t2 on (t1.a = t2.b) where t2.c 0; This is actually going to get simplified to a plain inner join, but even if it did not, we needn't fetch t2 rows where c is null or negative. That might result in generating some null-extended join rows that shouldn't be there, if particular values of t1.a no longer have matches, but the outer WHERE clause would eliminate such bogus rows anyway. However, that last conclusion depends on the outer WHERE clause being strict, which doesn't work for instance with select ... from t1 left join t2 on (t1.a = t2.b) where t2.c is null; We could not use an index having the predicate c is null to scan t2, else we might eliminate some rows that should produce matches to t1, allowing bogus null-extended rows to be produced (which would survive the outer WHERE). So I'm thinking that the correct heuristic for check_partial_indexes() is to use restriction clauses plus any join clauses that satisfy join_clause_is_movable_to(). That should prevent unsafe use of upper-level join clauses when there are outer joins below them. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl reload -o ....
Andrew Dunstan and...@dunslane.net writes: On 11/15/2012 03:41 PM, Jeff Janes wrote: OK. You see pg_ctl as a wrapper around postmaster, while I was viewing it as a thing in itself. You're possibly not aware of its history. Before 8.0 it was a shell script and its use was seen as rather optional. It's *still* rather optional. A lot of distros use init scripts that don't bother with it. Personally I find it convenient for stopping the postmaster, but not all that helpful for starting it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] another idea for changing global configuration settings from SQL
Peter Eisentraut pete...@gmx.net writes: The existing infrastructure would also support any user, any database (= all the time) There would also be the advantage that pg_dumpall would save these settings. Thoughts? That's brilliant. +1. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
Darren Duncan dar...@darrenduncan.net writes: So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW Kevin has been proposing that we consider an alternative approach in some other cases that I think would work better for you, too. Namely, to have access to OLD and NEW in FOR EACH STATEMENT triggers, where they would be relations rather than records. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
Peter Geoghegan pe...@2ndquadrant.com writes: should expose the hash. The need to aggregate historical statistics just doesn't appreciably alter things here, I feel. The number of discrete queries that an application will execute in a week just isn't that different from the number that it will ever execute, I suspect. Please don't forget that some people won't write the most effective SQL from the get go and will rewrite problematic queries. Some people will even rollout new features in their applications, with new queries to implement them. Or new applications on top of the existing database. So I don't think that the query list is that static. That said, I don't have any idea at all about the impact of what I'm saying to your analysis… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further pg_upgrade analysis for many tables
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Alvaro Herrera alvhe...@2ndquadrant.com writes: Could we use some adaptive mechanism here? Say we use a list for the first ten entries, and if an eleventh one comes in, we create a hash table for that one and all subsequent ones. All future calls would have to examine both the list for the first few and then the hash table. Is it necessary to do so? Do we know for sure that a 10 elements hash table is slower than a 10 elements list when only doing key based lookups, for the object data type we're interested into here? Well, we'd want to do some testing to choose the cutover point. Personally I'd bet on that point being quite a bit higher than ten, for the case that sequence.c is using where the key being compared is just an OID. You can compare a lot of OIDs in the time it takes dynahash.c to do something. (I think the above sketch is wrong in detail, btw. What we should do once we decide to create a hash table is move all the existing entries into the hash table, not continue to scan a list for them. There's a similar case in the planner for tracking join RelOptInfos.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add -Wlogical-op to standard compiler options?
On 11/15/12 9:40 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: I think it might be worth adding -Wlogical-op to the standard warning options (for supported compilers, determined by configure test). Does that add any new warnings with the current source code, and if so what? none -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
Dimitri Fontaine wrote: Darren Duncan dar...@darrenduncan.net writes: So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW Kevin has been proposing that we consider an alternative approach in some other cases that I think would work better for you, too. Namely, to have access to OLD and NEW in FOR EACH STATEMENT triggers, where they would be relations rather than records. Regards, Yes, I believe that would work very well. In fact, that would provide some power features. I look forward to this, probably the best solution. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add -Wlogical-op to standard compiler options?
Peter Eisentraut pete...@gmx.net writes: On 11/15/12 9:40 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: I think it might be worth adding -Wlogical-op to the standard warning options (for supported compilers, determined by configure test). Does that add any new warnings with the current source code, and if so what? none No objection from me then. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] another idea for changing global configuration settings from SQL
On Thu, Nov 15, 2012 at 6:53 PM, Peter Eisentraut pete...@gmx.net wrote: Independent of the discussion of how to edit configuration files from SQL, I had another idea how many of the use cases for this could be handled. We already have the ability to store in pg_db_role_setting configuration settings for specific user, specific database specific user, any database any user, specific database The existing infrastructure would also support any user, any database (= all the time) All you'd need is to add ApplySetting(InvalidOid, InvalidOid, relsetting, PGC_S_$SOMETHING); in postinit.c, and have some SQL command to modify this setting. The only thing you couldn't handle that way are SIGHUP settings, but the often-cited use cases work_mem, logging, etc. would work. How hard would it be to make it work for SIGHUP? I can see how it would be impossible to handle things like POSTMASTER, but SIGHUP seems like it should be doable somehow? There would also be the advantage that pg_dumpall would save these settings. Thoughts? I like it. Not as a replacement for the other facility, but as another way of doing it. And I'd expect it could be the main way for manual changes, but tools would still need access to the other way of course. We probably need to enhance pg_settings to tell the user *where* the setting came from whe nit's set this way. In fact, we need this already, since it can be hard to track down... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 01:25, Peter Geoghegan pe...@2ndquadrant.com wrote: Attached is a piece I wrote on the feature. That might form the basis of a new wiki page. Since no one else moved on this, I've completely replaced the existing page with the contents of the user-orientated document I posted. I don't believe that any of the information that has been removed was of general interest, since it only existed as a developer-orientated page. If anyone thinks that I shouldn't have removed everything that was there, or indeed who would like to change what I've added, well, it's a wiki. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum truncate exclusive lock round two
Jan Wieck janwi...@yahoo.com writes: Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to periodically check, if there is a conflicting lock request waiting. If not, keep going. If there is a waiter, truncate the relation to the point checked thus far, release the AccessExclusiveLock, then loop back to where we acquire this lock in the first place and continue checking/truncating. I think that maybe we could just bail out after releasing the AccessExclusiveLock and trust autovacuum to get back to truncating that relation later. That would allow removing 2 of the 3 GUCs below: autovacuum_truncate_lock_check = 100ms # how frequent to check # for conflicting locks This is the one remaining. Could we maybe check for lock conflict after every move backward a page, or some multiple thereof? The goal would be to ensure that progress is made, while also being aware of concurrent activity, ala CHECK_FOR_INTERRUPTS(). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum truncate exclusive lock round two
Dimitri Fontaine wrote: Jan Wieck janwi...@yahoo.com writes: Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to periodically check, if there is a conflicting lock request waiting. If not, keep going. If there is a waiter, truncate the relation to the point checked thus far, release the AccessExclusiveLock, then loop back to where we acquire this lock in the first place and continue checking/truncating. I think that maybe we could just bail out after releasing the AccessExclusiveLock and trust autovacuum to get back to truncating that relation later. That doesn't work, because the truncating code is not reached unless vacuuming actually took place. So if you interrupt it, it will just not get called again later. Maybe we could have autovacuum somehow invoke that separately, but that would require that the fact that truncation was aborted is kept track of somewhere. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers