Re: [HACKERS] estimating # of distinct values
On 20.01.2011 04:36, Robert Haas wrote: ... Even better, the code changes would be confined to ANALYZE rather than spread out all over the system, which has positive implications for robustness and likelihood of commit. Keep in mind that the administrator can already override the ndistinct estimate with ALTER TABLE. If he needs to manually run a special ANALYZE command to make it scan the whole table, he might as well just use ALTER TABLE to tell the system what the real (or good enough) value is. A DBA should have a pretty good feeling of what the distribution of his data is like. And how good does the estimate need to be? For a single-column, it's usually not that critical, because if the column has only a few distinct values then we'll already estimate that pretty well, and OTOH if ndistinct is large, it doesn't usually affect the plans much if it's 10% of the number of rows or 90%. It seems that the suggested multi-column selectivity estimator would be more sensitive to ndistinct of the individual columns. Is that correct? How is it biased? If we routinely under-estimate ndistinct of individual columns, for example, does the bias accumulate or cancel itself in the multi-column estimate? I'd like to see some testing of the suggested selectivity estimator with the ndistinct estimates we have. Who knows, maybe it works fine in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Tom Lane t...@sss.pgh.pa.us writes: Actually the other way around. An opclass is the subset of an opfamily that is tightly bound to an index. The build methods have to be associatable with an index, so they're part of the index's opclass. The query methods could be loose in the opfamily. I had understood your proposal to change that for GIN. Thinking again now with keeping opfamily and opclass as they are now: an opclass is the code we run to build and scan the index, an opfamily is a way to use the same index data and code in more contexts than strictly covered by an opclass. The planner's not the problem here --- what's missing is the rule for the index AM to look up the right support functions to call at runtime. The trick is to associate the proper query support methods with any given query operator (which'd also be loose in the family, probably). The existing schema for pg_amop and pg_amproc is built on the assumption that the amoplefttype/amoprighttype are sufficient for making this association; but that seems to fall down if we would like to allow contrib modules to add new query operators that coincidentally take the same input types as an existing opfamily member. Well the opfamily machinery allows to give query support to any index whose opclass is in the family. That is, the same set of operators are covered by more than one opclass. What we want to add is more than one set of operators can find data support in more than one index kind. But you still want to run specific search code here. So it seems to me we shouldn't attack the problem at the operator left and right type level, but rather model that we need another level of flexibility, separating somewhat the index data building and maintaining from the code that's used to access it. The example that we're working from seem to be covered if we are able to instruct PostgreSQL than a set of opclass'es are binary coercible, I think that's the term here. Then the idea would be to have PostgreSQL able to figure out that a given index can be used with any binary coercible opclass, rather than only the one used to maintain it. What do you think? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: RangeTypes
New patch. I added a lot of generic range functions, and a lot of operators. There is still more work to do, this is just an updated patch. The latest can be seen on the git repository, as well: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Regards, Jeff Davis rangetypes-20110119.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in
2011/1/20 Stephen Frost sfr...@snowman.net: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 19, 2011 at 6:04 PM, Stephen Frost sfr...@snowman.net wrote: I'm going to mark this returned to author with feedback. That implies you don't think it should be considered further for this CommitFest. Perhaps you mean Waiting on Author? I did, actually, and that's what I actually marked it as in the CF. Sorry for any confusion. When I went to mark it in CF, I realized my mistake. ok :), I'll look on it tomorrow. regards Pavel Thanks, Stephen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk03ltkACgkQrzgMPqB3kihSmQCePy6+fpC7RJdki5guPRCLp5IZ EJMAoIqgjb+IsG853/gC9T9xgFg5M5aM =VLWh -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REPLICATION privilege and shutdown
Hi, Both the user with REPLICATION privilege and the superuser can call pg_stop_backup. But only superuser can connect to the server to cancel online backup during shutdown. The non-superuser with REPLICATION privilege cannot. Is this behavior intentional? Or just oversight? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REPLICATION privilege and shutdown
On 20.01.2011 11:41, Fujii Masao wrote: Hi, Both the user with REPLICATION privilege and the superuser can call pg_stop_backup. But only superuser can connect to the server to cancel online backup during shutdown. The non-superuser with REPLICATION privilege cannot. Is this behavior intentional? Or just oversight? I think we need to consider the situation after the multiple streaming base backups patch goes in. After that we can change pg_stop_backup() so that you need superuser privileges to run it again - replication privileges is enough to do a streaming base backup, but that no longer interferes with the pg_start/stop_backup() admin functions. At the moment, a streaming base backup and manual pg_start/stop_backup() use the same machinery, so it's possible e.g to run pg_stop_backup() while a streaming base backup is running, causing it to fail at the end. Or worse, you can run pg_stop_backup()+pg_start_backup(), and the streaming base backup will seemingly succeed, but the produced backup is potentially corrupt. The multiple base backups patch will fix that too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REPLICATION privilege and shutdown
On Thu, Jan 20, 2011 at 10:50, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 20.01.2011 11:41, Fujii Masao wrote: Hi, Both the user with REPLICATION privilege and the superuser can call pg_stop_backup. But only superuser can connect to the server to cancel online backup during shutdown. The non-superuser with REPLICATION privilege cannot. Is this behavior intentional? Or just oversight? I think we need to consider the situation after the multiple streaming base backups patch goes in. After that we can change pg_stop_backup() so that you need superuser privileges to run it again - replication privileges is enough to do a streaming base backup, but that no longer interferes with the pg_start/stop_backup() admin functions. At the moment, a streaming base backup and manual pg_start/stop_backup() use the same machinery, so it's possible e.g to run pg_stop_backup() while a streaming base backup is running, causing it to fail at the end. Or worse, you can run pg_stop_backup()+pg_start_backup(), and the streaming base backup will seemingly succeed, but the produced backup is potentially corrupt. The multiple base backups patch will fix that too. Yeah; I've been avoiding to even think about that one pending the multiple base backups, onthe assumption that it goes in before release. If it doesn't, for some reason, that needs to be revisited, obviously. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REPLICATION privilege and shutdown
On Thu, Jan 20, 2011 at 6:50 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we need to consider the situation after the multiple streaming base backups patch goes in. After that we can change pg_stop_backup() so that you need superuser privileges to run it again - replication privileges is enough to do a streaming base backup, but that no longer interferes with the pg_start/stop_backup() admin functions. Fair enough. At the moment, a streaming base backup and manual pg_start/stop_backup() use the same machinery, so it's possible e.g to run pg_stop_backup() while a streaming base backup is running, causing it to fail at the end. Or worse, you can run pg_stop_backup()+pg_start_backup(), and the streaming base backup will seemingly succeed, but the produced backup is potentially corrupt. The multiple base backups patch will fix that too. OK. BTW, I found this behavior when I read your patch ;) Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: First, I'd like to note that the thread for this patch had *four* me-too responses to the use case. That's extremely unusual; the subject is definitely compelling to people. It addresses the bad behavior of natural attempts to atomically swap two tables in the namespace: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock # Do it this way, and the next SELECT gets data from the old table. #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' # Do it this way, and get: ERROR: could not open relation with OID 41380 psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'. psql -c 'DROP TABLE IF EXISTS t, old_t, new_t' by letting you do this instead: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock psql -c 'EXCHANGE TABLE new_t TO t psql -c 'SELECT * FROM t' # I get 'new', finally! psql -c 'DROP TABLE IF EXISTS t, new_t' I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread interesting: can we just make the first example work? Even granting that the second syntax may be a useful addition, the existing behavior of the first example is surely worthless, even actively harmful. I tossed together a proof-of-concept patch, attached, that makes the first example DTRT. Do you see any value in going down that road? As I said previously on the thread you quote, having this happen implicitly is not a good thing, and IMHO, definitely not the right thing. Heikki's suggestion, and your patch, contain no checking to see whether the old and new tables are similar. If they are not similar then we have all the same problems raised by my patch. SQL will suddenly fail because columns have ceased to exist, FKs suddenly disappear etc.. I don't see how having a patch helps at all. I didn't think it was the right way before you wrote it and I still disagree now you've written it. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-scope advisory locks
On 2011-01-17 9:28 AM +0200, Itagaki Takahiro wrote: Here is a short review for Transaction scoped advisory locks: https://commitfest.postgresql.org/action/patch_view?id=518 Thanks for reviewing! == Features == The patch adds pg_[try_]advisory_xact_lock[_shared] functions. The function names follows the past discussion -- it's better than bool isXact argument or changing the existing behavior. == Coding == I expect documentation will come soon. I'm sorry about this, I have been occupied with other stuff. I'm going to work on this tonight. There is no regression test, but we have no regression test for advisory locks even now. Tests for lock conflict might be difficult, but we could have single-threaded test for lock/unlock and pg_locks view. Seems useful. == Questions == I have a question about unlocking transaction-scope advisory locks. We cannot unlock them with pg_advisory_unlock(), but can unlock with pg_advisory_unlock_all(). It's inconsistent behavior. Furthermore, I wonder we can allow unlocking transaction-scope locks -- we have LOCK TABLE but don't have UNLOCK TABLE. I guess we could add new pg_advisory_txn_unlock() functions to unlock transaction-scope locks, but I do share your doubt on whether or not we want to allow this at all. On the other hand, the reasons why we don't allow non-advisory locks to be unreleased is a lot more clear than the issue at hand. I have no strong opinion on this. Another thing I now see is this: BEGIN; SELECT pg_advisory_xact_lock(1); -- do something here -- upgrade to session lock SELECT pg_advisory_lock(1); COMMIT; This seems useful, since the xact lock would be automatically released if an error happens during -- do something here so you wouldn't need to worry about releasing the lock elsewhere. But I'm not sure this is safe. Can anyone see a problem with it? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On 19.01.2011 16:01, Joachim Wieland wrote: On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here are the latest patches all of them also rebased to current HEAD. Will update the commitfest app as well. What's the idea of storing the file sizes in the toc file? It looks like it's not used for anything. It's part of the overall idea to make sure files are not inadvertently exchanged between different backups and that a file is not truncated. In the future I'd also like to add a checksum to the TOC so that a backup can be checked for integrity. This will cost performance but with the parallel backup it can be distributed to several processors. Ok. I'm going to leave out the filesize. I can see some value in that, and the CRC, but I don't want to add stuff that's not used at this point. It would be nice to have this format match the tar format. At the moment, there's a couple of cosmetic differences: * TOC file is called TOC, instead of toc.dat * blobs TOC file is called BLOBS.TOC instead of blobs.toc * each blob is stored as blobs/oid.dat, instead of blob_oid.dat That can be done easily... The only significant difference is that in the directory archive format, each data file has a header in the beginning. What are the benefits of the data file header? Would it be better to leave it out, so that the format would be identical to the tar format? You could then just tar up the directory to get a tar archive, or vice versa. The header is there to identify a file, it contains the header that every other pgdump file contains, including the internal version number and the unique backup id. The tar format doesn't support compression so going from one to the other would only work for an uncompressed archive and special care must be taken to get the order of the tar file right. Hmm, tar format doesn't support compression, but looks like the file format issue has been thought of already: there's still code there to add .gz suffix for compressed files. How about adopting that convention in the directory format too? That would make an uncompressed directory format compatible with the tar format. That seems pretty attractive anyway, because you can then dump to a directory, and manually gzip the data files later. Now that we have an API for compression in compress_io.c, it probably wouldn't be very hard to implement the missing compression support to tar format either. If you want to drop the header altogether, fine with me but if it's just for the tar- directory conversion, then I am failing to see what the use case of that would be. A tar archive has the advantage that you can postprocess the dump data with other tools but for this we could also add an option that gives you only the data part of a dump file (and uncompresses it at the same time if compressed). Once we have that however, the question is what anybody would then still want to use the tar format for... I don't know how popular it'll be in practice, but it seems very nice to me if you can do things like parallel pg_dump in directory format first, and then tar it up to a file for archival. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 05:23, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander mag...@hagander.net wrote: Great. Thanks for the quick update! Here are another comments: Here are comments against the documents. The other code looks good. Thanks! It's helpful to document what to set to allow pg_basebackup connection. That is not only the REPLICATION privilege but also max_wal_senders and pg_hba.conf. Hmm. Yeha, i guess that wouldn't hurt. Will add that. + refsect1 + titleOptions/title Can we list the descriptions of option in the same order as pg_basebackup --help does? It's helpful to document that the target directory must be specified and it must be empty. Yeah, that's on the list - I just wanted to make any other changes first before I did that. I based on (no further) feedback and a few extra questions, I'm going to change it per your suggestion to use -D dir -F format, instead of -D/-T, which will change that stuff anyway. So I'll reorder them at that time. + para + The backup will include all files in the data directory and tablespaces, + including the configuration files and any additional files placed in the + directory by third parties. Only regular files and directories are allowed + in the data directory, no symbolic links or special device files. The latter sentence means that the backup of the database cluster created by initdb -X is not supported? Because the symlink to the actual WAL directory is included in it. No, it's not. pg_xlog is specifically excluded, and sent as an empty directory, so upon restore you will have an empty pg_xlog directory. OTOH, I found the following source code comments: + * Receive a tar format stream from the connection to the server, and unpack + * the contents of it into a directory. Only files, directories and + * symlinks are supported, no other kinds of special files. This says that symlinks are supported. Which is true? Is the symlink supported only in tar format? That's actually a *backend* side restriction. If there is a symlink anywhere other than pg_tblspc in the data directory, we simply won't send it across (with a warning). The frontend code supports creating symlinks, both in directory format and in tar format (actually, in tar format it doesn't do anything, of course, it just lets it through) It wouldn't actually be hard to allow the inclusion of symlinks in the backend side. But it would make verification a lot harder - for example, if someone symlinked out pg_clog (as an example), we'd back up the symlink but not the actual files since they're not actually registered as a tablespace. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Include WAL in base backup
On Thu, Jan 20, 2011 at 05:03, Stephen Frost sfr...@snowman.net wrote: Greetings, * Magnus Hagander (mag...@hagander.net) wrote: For now, you need to set wal_keep_segments to make it work properly, but if you do the idea is that the tar file/stream generated in the base backup will include all the required WAL files. Is there some reason to not ERROR outright if we're asked to provide WAL and wal_keep_segments isn't set..? I'd rather do that than only ERROR when a particular WAL is missing.. That could lead to transient backup errors that an inexperienced sysadmin or DBA might miss or ignore. They'll notice if it doesn't work the first time they try it and spits out a hint about wal_keep_segments. Well, in a smaller:ish database you can easily do the full backup before you run out of segments in the data directory even when you haven't set wal_keep_segments. If we error out, we force extra work on the user in the trivial case. I'd rather not change that, but instead (as Fujii-san has also mentioned is needed anyway) put some more effort into documenting in which cases you need to set it. I've got some refactoring I want to do around the SendBackupDirectory() function after this, but a review of the functionality first would be good. And obviously, documentation is still necessary. mkay, I'm not going to try to make this ready for committer, but will provide my comments on it overall. Bit difficult to review when someone else is reviewing the base patch too. :/ Heh, yeah. Here goes: - I'm not a huge fan of the whole 'closetar' option, that feels really rather wrong to me. Why not just open it and close it in perform_base_backup(), unconditionally? Yeah, we could move the whole thing up there. Or, as I mentioned in an IM conversation with Heikki, just get rid of SendBackupDirectory() completely and inline it inside the loop in perform_base_backup(). Given that it's basically just 5 lines + a call to sendDir().. - I wonder if you're not getting to a level where you shold be using a struct to pass the relevant information to perform_base_backup() instead of adding more arguments on.. That's going to get unwieldy at some point. Yeah, probably. We *could* pass the BaseBackupCmd struct from the parser all the way in - or is that cheating too much on abstractions? It seems if we don't, we're just going to hav ea copy of that struct without the NodeTag member.. - Why not initialize logid and logseg like so?: int logid = startptr.xlogid; int logseg = startptr.xrecoff / XLogSegSize; Then use those in your elog? Seems cleaner to me. Hmm. Yes. Agreed. - A #define right in the middle of a while loop...? Really? Haha, yeah, that was a typo. I didn't remember the name of the variable so I stuck it there for testing and forgot it. It should be ThisTimeLineID, and no #define at all. - The grammar changes strike me as.. odd. Typically, you would have an 'option' production that you can then have a list of and then let each option be whatever the OR'd set of options is. Wouldn't the current grammar require that you put the options in a specific order? That'd blow. It does require them in a specific order. The advantage is that it makes the code easier. and it's not like end users are expected to run them anyway... Now, I'm no bison export, so it might be an easy fix. But the way I could figure, to make them order indepdent I have to basically collect them up together as a List instead of just as a struct, and then loop through that list to build a struct later. If someone who knows Bison better can tell me a neater way to do that, I'll be happy to change :-) @@ -687,7 +690,7 @@ BaseBackup() * once since it can be relocated, and it will be checked before we do * anything anyway. */ - if (basedir != NULL i 0) + if (basedir != NULL !PQgetisnull(res, i, 1)) verify_dir_is_empty_or_create(PQgetvalue(res, i, 1)); } - Should the 'i 0' conditional above still be there..? No. That's a cheat-check that assumes the base directory is always sent first. Which is not true anymore - with this patch we always send it *last* so we can include the WAL in it. So, that's my review from just reading the source code and the thread.. Hope it's useful, sorry it's not more. :/ Thanks - it certainly is! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 12:42, Magnus Hagander mag...@hagander.net wrote: On Thu, Jan 20, 2011 at 05:23, Fujii Masao masao.fu...@gmail.com wrote: It's helpful to document what to set to allow pg_basebackup connection. That is not only the REPLICATION privilege but also max_wal_senders and pg_hba.conf. Hmm. Yeha, i guess that wouldn't hurt. Will add that. Added, see github branch. + refsect1 + titleOptions/title Can we list the descriptions of option in the same order as pg_basebackup --help does? It's helpful to document that the target directory must be specified and it must be empty. Yeah, that's on the list - I just wanted to make any other changes first before I did that. I based on (no further) feedback and a few extra questions, I'm going to change it per your suggestion to use -D dir -F format, instead of -D/-T, which will change that stuff anyway. So I'll reorder them at that time. Updated on github. + para + The backup will include all files in the data directory and tablespaces, + including the configuration files and any additional files placed in the + directory by third parties. Only regular files and directories are allowed + in the data directory, no symbolic links or special device files. The latter sentence means that the backup of the database cluster created by initdb -X is not supported? Because the symlink to the actual WAL directory is included in it. No, it's not. pg_xlog is specifically excluded, and sent as an empty directory, so upon restore you will have an empty pg_xlog directory. Actually, when I verified that statement, I found a bug where we sent the wrong thing if pg_xlog was a symlink, leading to a corrupt tarfile! Patch is in the github branch. OTOH, I found the following source code comments: + * Receive a tar format stream from the connection to the server, and unpack + * the contents of it into a directory. Only files, directories and + * symlinks are supported, no other kinds of special files. This says that symlinks are supported. Which is true? Is the symlink supported only in tar format? That's actually a *backend* side restriction. If there is a symlink anywhere other than pg_tblspc in the data directory, we simply won't send it across (with a warning). The frontend code supports creating symlinks, both in directory format and in tar format (actually, in tar format it doesn't do anything, of course, it just lets it through) It wouldn't actually be hard to allow the inclusion of symlinks in the backend side. But it would make verification a lot harder - for example, if someone symlinked out pg_clog (as an example), we'd back up the symlink but not the actual files since they're not actually registered as a tablespace. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in SignalSomeChildren
--On 22. Dezember 2010 15:51:09 +0900 Fujii Masao masao.fu...@gmail.com wrote: How about doing target != ALL test at the head for the most common case (target == ALL)? That's an idea, but the test you propose implements it incorrectly. Thanks! I revised the patch. I had a look at this for the current CF and the patch looks reasonable to me. Some testing shows that the changes are working as intended (at least, the wal sender actually receives now signals from SignalSomeChildren() as far as the DEBUG4 output shows). Maybe we should put in a small comment, why we special case BACKEND_TYPE_ALL (following Tom's comment about expensive shared memory access and IsPostmasterChildWalSender()). I marked it as Ready for Committer. Question for my understanding: While reading the small patch, i realized that there's no BACKEND_TYPE_WALRECV or similar. If i understand correctly there's no need to handle it this way, since there's only one wal receiver process per instance? -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
On Wed, Jan 19, 2011 at 00:34, Shigeru HANADA han...@metrosystems.co.jp wrote: Attached patch requires FDW API patches and copy_export-20110114.patch. Some minor comments: * Can you pass slot-tts_values and tts_isnull directly to NextCopyFrom()? It won't allocate the arrays; just fill the array buffers. * You can pass NULL for the 4th argument for NextCopyFrom(). | Oid tupleoid; /* just for required parameter */ * file_fdw_validator still has duplicated codes with BeginCopy, but I have no idea to share the validation code in clean way... * Try strVal() instead of DefElem-val.str * FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate? * private is a bad identifier name because it's a C++ keyword. We should rename FdwExecutionState-private. In that message, you also pointed out that FDW must generate explainInfo in every PlanRelScan call even if the planning is not for EXPLAIN. I'll try to defer generating explainInfo until EXPLAIN VERBOSE really uses it. It might need new hook point in expalain.c, though. I complained about the overhead, but it won't be a problem for file_fdw and pgsql_fdw. file_fdw can easily generate the text, and pgsql_fdw needs to generate a SQL query anyway. My concern is the explainInfo interface is not ideal for the purpose and therefore it will be unstable interface. If we support nested plans in FDWs, each FDW should receive a tree writer used internally in explain.c. explainInfo, that is a plan text, is not enough for complex FdwPlans. However, since we don't have any better solution for now, we could have the variable for 9.1. It's much better than nothing. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's part of the overall idea to make sure files are not inadvertently exchanged between different backups and that a file is not truncated. In the future I'd also like to add a checksum to the TOC so that a backup can be checked for integrity. This will cost performance but with the parallel backup it can be distributed to several processors. Ok. I'm going to leave out the filesize. I can see some value in that, and the CRC, but I don't want to add stuff that's not used at this point. Okay. The header is there to identify a file, it contains the header that every other pgdump file contains, including the internal version number and the unique backup id. The tar format doesn't support compression so going from one to the other would only work for an uncompressed archive and special care must be taken to get the order of the tar file right. Hmm, tar format doesn't support compression, but looks like the file format issue has been thought of already: there's still code there to add .gz suffix for compressed files. How about adopting that convention in the directory format too? That would make an uncompressed directory format compatible with the tar format. So what you could do is dump in the tar format, untar and restore in the directory format. I see that this sounds nice but still I am not sure why someone would dump to the tar format in the first place. But you still cannot go back from the directory archive to the tar archive because the standard command line tar will not respect the order of the objects that pg_restore expects in a tar format, right? That seems pretty attractive anyway, because you can then dump to a directory, and manually gzip the data files later. The command line gzip will probably add its own header to the file that pg_restore would need to strip off... This is a valid use case for people who are concerned with a fast dump, usually they would dump uncompressed and later compress the archive. However once we have parallel pg_dump, this advantage vanishes. Now that we have an API for compression in compress_io.c, it probably wouldn't be very hard to implement the missing compression support to tar format either. True, but the question to the advantage of the tar format remains :-) A tar archive has the advantage that you can postprocess the dump data with other tools but for this we could also add an option that gives you only the data part of a dump file (and uncompresses it at the same time if compressed). Once we have that however, the question is what anybody would then still want to use the tar format for... I don't know how popular it'll be in practice, but it seems very nice to me if you can do things like parallel pg_dump in directory format first, and then tar it up to a file for archival. Yes, but you cannot pg_restore the archive then if it was created with standard tar, right? Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] streaming replication feature request
Scott Ribe wrote: How about supporting something like: wal_keep_segments = '7d' [ moved to hackers] Sorry for the late reply. That is a very interesting idea. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving test_fsync to /contrib?
Greg Smith wrote: Alvaro Herrera wrote: I don't understand why it would be overkill. Are you saying people would complain because you installed a 25 kB executable that they might not want to use? Just for fun I checked /usr/bin and noticed that I have a pandoc executable, weighing 17 MB, that I have never used and I have no idea what is it for. It's for converting between the various types of text-like markup, i.e. reST, LaTex, Markdown, etc. Anyway, just because the rest of the world has no standards anymore doesn't mean we shouldn't. The changes Bruce has made recently have gotten this tool to where its output is starting to be readable and reliable. The sort of people who want to run this will certainly be fine with installing contrib to do it, because they may want to have things like pgbench too. There's really not enough demand for this to pollute the default server install footprint with any overhead from this tool, either in bytes or increased tool name squatting. And the fact that it's still a little rough around the edges nudges away from the standard server package too. Install in contrib as pg_test_fsync and I think you'll achieve the optimal subset of people who can be made happy here. Not having it packaged at all before wasn't a big deal, because it was so hard to collect good data from only developer-level people were doing it anyway. Now that it is starting to be more useful in that role for less experienced users, we need to make it easier for more people to run it, to collect feedback toward further improving its quality. OK, I am ready to move test_fsync to /contrib. Is pg_test_fsync the best name? pg_check_fsync? pg_fsync_performance? pg_verify_fsync? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving test_fsync to /contrib?
On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote: OK, I am ready to move test_fsync to /contrib. Is pg_test_fsync the best name? pg_check_fsync? pg_fsync_performance? pg_verify_fsync? I don't see too much reason to rename it more than necessary, so how about pg_test_fsync? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints
On Thu, Jan 20, 2011 at 12:57 AM, Noah Misch n...@leadboat.com wrote: There are two distinct questions here: Agreed. (1) Should reindex_relation receive boolean facts from its callers by way of two boolean arguments, or by way of one flags vector? The former seems best when you want every caller to definitely think about which answer is right, and the latter when that's not so necessary. (For a very long list of options, the flags might be chosen on other grounds.) As framed, I'd lean toward keeping distinct arguments, as these are important questions. My main beef with the Boolean flags is that this kind of thing is not too clear: reindex_relation(myrel, false, false, true, true, false, true, false, false, true); Unless you have an excellent memory, you can't tell what the heck that's doing without flipping back and forth between the function definition and the call site. With a bit-field, it's a lot easier to glance at the call site and have a clue what's going on. We're of course not quite to the point of that exaggerated example yet. However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS and REINDEX_ALLOW_OLD_INDEX_USE. Then, flags = 0 can hurt performance but not correctness. That's looking like a win. I prefer the positive sense for those flags because I think it's more clear. There aren't so many call sites or so many people using this that we have to worry about what people are going to do in new calling locations; getting it right in any new code shouldn't be a consideration. (2) Should reindex_relation frame its boolean arguments in terms of what the caller did (heap_rebuilt, tuples_changed), or in terms of what reindex_relation will be doing (check_constraints, suppress_index_use)? Yeah, I know we're doing the former now, but I think it's just getting confusing for exactly the reasons you state: I agree that both heap_rebuilt and tuples_changed are bad abstractions. TRUNCATE is lying about the former, and the latter, as you say, is never really correct. column_values_changed, perhaps. tuples_could_now_violate_constraints would be correct, but that's just a bad spelling for REINDEX_CHECK_CONSTRAINTS. I guess the equivalent long-winded improvement for heap_rebuilt would be indexes_still_valid_for_snapshotnow. Eh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Wed, Jan 19, 2011 at 4:27 PM, Simone Aiken sai...@ulfheim.net wrote: In my experience size increases related to documentation are almost always worth it. So I'm prejudiced right out of the gate. I was wondering if every pg_ table gets copied out to every database .. if there is already a mechanism for not replicating all of them we could utilize views or re-writes rules to merge a single copy of catalog comments in a separate table with each deployed database's pg_descriptions. All of them get copied, except for a handful of so-called shared catalogs. Changing that would be difficult. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
Excerpts from Robert Haas's message of jue ene 20 00:10:55 -0300 2011: You have to write it with a line of dashes on the first and last lines, if you don't want it reformatted as a paragraph. It might be worth actually running pgindent over contrib/selinux and then check over the results. Hmm, I don't think pgindent is run regularly on contrib as it is on the core code. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. I don't like those names at all :( I agree. ?I don't think your original names are bad, as long as they're well-documented. ?I sympathize with Simon's desire to make it clear that these use the replication framework, but I really don't want the command names to be that long. Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It seems pg_create_backup would be the most natural because we already have pg_start_backup and pg_stop_backup. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
On Thu, Jan 20, 2011 at 9:59 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of jue ene 20 00:10:55 -0300 2011: You have to write it with a line of dashes on the first and last lines, if you don't want it reformatted as a paragraph. It might be worth actually running pgindent over contrib/selinux and then check over the results. Hmm, I don't think pgindent is run regularly on contrib as it is on the core code. I went back and looked at commit 239d769e7e05e0a5ef3bd6828e93e22ef3962780 and it touches both src and contrib. But even if we don't always do that, it seems like a good idea to fix whatever we're committing so that a hypothetical future pgindent run won't mangle it. Incidentally, I thought that running pgindent twice in the 9.0 cycle, once after the end of CF4 and again just before the branch worked well. Anyone up for doing it that way again this time? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. I don't like those names at all :( I agree. ?I don't think your original names are bad, as long as they're well-documented. ?I sympathize with Simon's desire to make it clear that these use the replication framework, but I really don't want the command names to be that long. Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It seems pg_create_backup would be the most natural because we already have pg_start_backup and pg_stop_backup. Uh, wow. That's really mixing apples and oranges. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Hi Tomas, On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote: No, the multi-column statistics do not require constant updating. There are cases where a sampling is perfectly fine, although you may need a bit larger sample. Generally if you can use a multi-dimensional histogram, you don't need to scan the whole table. In the cases where sampling is enough, you can do that to the updates too: do a sampling on the changes, in that you only process every Nth change to make it to the estimator. If you can also dynamically tune the N to grow it as the statistics stabilize, and lower it if you detect high variance, even better. If the analyze process could be decoupled from the backends, and maybe just get the data passed over to be processed asynchronously, then that could be a feasible way to have always up to date statistics when the bottleneck is IO and CPU power is in excess. If that then leads to better plans, it could really be a win exceeding the overhead. If this analyze process (or more of them) could also just get the data from the modified buffers in a cyclic way, so that backends need nothing extra to do, then I don't see any performance disadvantage other than possible extra locking contention on the buffers and non-determinism of the actual time when a change makes it to the statistics. Then you just need to get more CPU power and higher memory bandwidth to pay for the accurate statistics. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas wrote: On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. I don't like those names at all :( I agree. ?I don't think your original names are bad, as long as they're well-documented. ?I sympathize with Simon's desire to make it clear that these use the replication framework, but I really don't want the command names to be that long. Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It seems pg_create_backup would be the most natural because we already have pg_start_backup and pg_stop_backup. Uh, wow. That's really mixing apples and oranges. I read the description as: +You can also use the xref linkend=app-pgbasebackup tool to take +the backup, instead of manually copying the files. This tool will take +care of the functionpg_start_backup()/, copy and +functionpg_stop_backup()/ steps automatically, and transfers the +backup over a regular productnamePostgreSQL/productname connection +using the replication protocol, instead of requiring filesystem level +access. so I thought, well it does pg_start_backup and pg_stop_backup, and also creates the data directory. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
Alvaro Herrera alvhe...@commandprompt.com writes: Hmm, I don't think pgindent is run regularly on contrib as it is on the core code. News to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is there a way to build PostgreSQL client libraries with MinGW
Hi, We are using R to work with 64bit PostgreSQL client libraries, and to avoid compiler compatibility issues the R development community suggest using the same compiler for both the main application and dlls. So do you have any experience to build libpq.dll using MinGW 64 bit. Thanks. Xiaobo Gu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 10:15 AM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. I don't like those names at all :( I agree. ?I don't think your original names are bad, as long as they're well-documented. ?I sympathize with Simon's desire to make it clear that these use the replication framework, but I really don't want the command names to be that long. Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It seems pg_create_backup would be the most natural because we already have pg_start_backup and pg_stop_backup. Uh, wow. That's really mixing apples and oranges. I read the description as: + You can also use the xref linkend=app-pgbasebackup tool to take + the backup, instead of manually copying the files. This tool will take + care of the functionpg_start_backup()/, copy and + functionpg_stop_backup()/ steps automatically, and transfers the + backup over a regular productnamePostgreSQL/productname connection + using the replication protocol, instead of requiring filesystem level + access. so I thought, well it does pg_start_backup and pg_stop_backup, and also creates the data directory. Yeah, but pg_start_backup() and pg_stop_backup() are server functions, and this is an application. Also, it won't actually work unless the server has replication configured (wal_level!=minimal, max_wal_senders0, and possibly some setting for wal_keep_segments), which has been the main point of the naming discussion thus far. Now, you know what would be REALLY cool? Making this work without any special advance configuration. Like if we somehow figured out a way to make max_wal_senders unnecessary, and a way to change wal_level without bouncing the server, so that we could temporarily boost the WAL level from minimal to archive if someone's running a backup. That, however, is not going to happen for 9.1... but it would be *really* cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On 20.01.2011 15:46, Joachim Wieland wrote: On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The header is there to identify a file, it contains the header that every other pgdump file contains, including the internal version number and the unique backup id. The tar format doesn't support compression so going from one to the other would only work for an uncompressed archive and special care must be taken to get the order of the tar file right. Hmm, tar format doesn't support compression, but looks like the file format issue has been thought of already: there's still code there to add .gz suffix for compressed files. How about adopting that convention in the directory format too? That would make an uncompressed directory format compatible with the tar format. So what you could do is dump in the tar format, untar and restore in the directory format. I see that this sounds nice but still I am not sure why someone would dump to the tar format in the first place. I'm not sure either. Maybe you want to pipe the output of pg_dump -F t via an ssh tunnel to another host, where you untar it, producing a directory format dump. You can then edit the directory format dump, and restore it back to the database without having to tar it again. It gives you a lot of flexibility if the formats are compatible, which is generally good. But you still cannot go back from the directory archive to the tar archive because the standard command line tar will not respect the order of the objects that pg_restore expects in a tar format, right? Hmm, I didn't realize pg_restore requires the files to be in certain order in the tar file. There's no mention of that in the docs either, we should add that. It doesn't actually require that if you read from a file, but from stdin it does. You can put files in the archive in a certain order if you list them explicitly in the tar command line, like tar cf backup.tar toc.dat It's hard to know the right order, though. In practice you would need to do tar tf backup.tar files before untarring, and use files to tar them again in the rightorder. That seems pretty attractive anyway, because you can then dump to a directory, and manually gzip the data files later. The command line gzip will probably add its own header to the file that pg_restore would need to strip off... Yeah, we should write the header too. That's not hard, e.g gzopen will do that automatically, or you can pass a flag to deflateInit2. A tar archive has the advantage that you can postprocess the dump data with other tools but for this we could also add an option that gives you only the data part of a dump file (and uncompresses it at the same time if compressed). Once we have that however, the question is what anybody would then still want to use the tar format for... I don't know how popular it'll be in practice, but it seems very nice to me if you can do things like parallel pg_dump in directory format first, and then tar it up to a file for archival. Yes, but you cannot pg_restore the archive then if it was created with standard tar, right? See above, you can unless you try to pipe it to pg_restore. In fact, that's listed as an advantage of the tar format over other formats in the pg_dump documentation. (I'm working on this, no need to submit a new patch) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving test_fsync to /contrib?
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote: OK, I am ready to move test_fsync to /contrib. Is pg_test_fsync the best name? pg_check_fsync? pg_fsync_performance? pg_verify_fsync? I don't see too much reason to rename it more than necessary, so how about pg_test_fsync? Yeah, there's no reason to try to confuse people about whether it's the same program or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Include WAL in base backup
Magnus Hagander mag...@hagander.net writes: - Why not initialize logid and logseg like so?: int logid = startptr.xlogid; int logseg = startptr.xrecoff / XLogSegSize; Then use those in your elog? Seems cleaner to me. Hmm. Yes. Agreed. Marginal complaint here: int is the wrong type, I'm pretty sure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On Jan20, 2011, at 16:22 , Heikki Linnakangas wrote: You can put files in the archive in a certain order if you list them explicitly in the tar command line, like tar cf backup.tar toc.dat It's hard to know the right order, though. In practice you would need to do tar tf backup.tar files before untarring, and use files to tar them again in the rightorder. Hm, could we create a file in the backup directory which lists the files in the right order? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-scope advisory locks
Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes: Another thing I now see is this: BEGIN; SELECT pg_advisory_xact_lock(1); -- do something here -- upgrade to session lock SELECT pg_advisory_lock(1); COMMIT; This seems useful, since the xact lock would be automatically released if an error happens during -- do something here so you wouldn't need to worry about releasing the lock elsewhere. But I'm not sure this is safe. Can anyone see a problem with it? I think the POLA dictates that the behavior of that should be that you now have both a transactional and a nontransactional hold on the lock; and only the transactional hold goes away at commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas wrote: I read the description as: + ? ?You can also use the xref linkend=app-pgbasebackup tool to take + ? ?the backup, instead of manually copying the files. This tool will take + ? ?care of the functionpg_start_backup()/, copy and + ? ?functionpg_stop_backup()/ steps automatically, and transfers the + ? ?backup over a regular productnamePostgreSQL/productname connection + ? ?using the replication protocol, instead of requiring filesystem level + ? ?access. so I thought, well it does pg_start_backup and pg_stop_backup, and also creates the data directory. Yeah, but pg_start_backup() and pg_stop_backup() are server functions, and this is an application. Also, it won't actually work unless the server has replication configured (wal_level!=minimal, max_wal_senders0, and possibly some setting for wal_keep_segments), which has been the main point of the naming discussion thus far. Now, you know what would be REALLY cool? Making this work without any special advance configuration. Like if we somehow figured out a way to make max_wal_senders unnecessary, and a way to change wal_level without bouncing the server, so that we could temporarily boost the WAL level from minimal to archive if someone's running a backup. That, however, is not going to happen for 9.1... but it would be *really* cool. Well, when we originally implemented PITR, we could have found a way to avoid using SQL commands to start/stop backup, but we envisioned that we would want to hook things on to those commands so we created a stable API that we could improve, and we have. Do we envision pg_basebackup as something we will enahance, and if so, should we consider a generic name? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Fujii Masao masao.fu...@gmail.com writes: On Thu, Jan 20, 2011 at 10:53 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not sure why that's the right solution. Why do you think that we should not create the tablespace under the $PGDATA directory? I'm not surprised that people mounts the filesystem on $PGDATA/mnt and creates the tablespace on it. No? Usually, having a mount point in a non-root-owned directory is considered a Bad Thing. Hmm.. but ISTM we can have a root-owned mount point in $PGDATA and create a tablespace there. Nonsense. The more general statement is that it's a security hole unless the mount point *and everything above it* is root owned. In the case you sketch, there would be nothing to stop the (non root) postgres user from renaming $PGDATA/mnt to something else and then inserting his own trojan-horse directories. Given that nobody except postgres and root could get to the mount point, maybe there wouldn't be any really serious problems caused that way --- but I still say that it's bad practice that no competent sysadmin would accept. Moreover, I see no positive *good* reason to do it. There isn't anyplace under $PGDATA that users should be randomly creating directories, much less mount points. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost sfr...@snowman.net wrote: This patch looked good, in general, to me. I added a few documentation updates and a comment, but it's a very straight-forward patch as far as I can tell. Passes all regressions and my additional testing. I have not looked at the code for this patch at all as yet, but just as a general user comment - I really, really want this. It's one of about, uh, two pieces of information that the EXPLAIN output doesn't give you that is incredibly important for troubleshooting. What's the other one? The main problem I've got with this patch is that there's no place to shoehorn the information into the textual EXPLAIN format without breaking a lot of expectations (and hence code --- it's insane to imagine that any significant amount of client-side code has been rewritten to make use of xml/json output yet). It would be nice to know what other requests are likely to be coming down the pike before we decide exactly how we're going to break things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote: That's another way of saying the patch is not anywhere close to being done. My patch is materially incomplete. Certainly we may see that as grounds for rejection, which I would not and could not argue with. It is a popular feature, so I submitted anyway. When I said Noah's patch was trivial, I was referring to the amount of work expended on it so far; no insult intended. I think the amount of code to finish either is fairly low as well. If we wish to continue in this release then we must decide how. What I was trying to indicate in my earlier comments was that my focus is on achieving the required functionality in this release, or put another way, I would accept Noah's patch rather than end with nothing. The main requirement, as I see it, is error checking. We need to do the same checking however we do it; neither patch currently does it. If Noah's patch had error checking, then it would at least be safe to recommend people do that. Then it is a simple matter of whether we think implicit is OK, or whether it needs an explicit command. My patch does it explicitly because that was the consensus from the earlier discussion; I am in favour of the explicit route which is why I wrote the patch that way, not because I wrote it that way. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 16:45, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: I read the description as: + ? ?You can also use the xref linkend=app-pgbasebackup tool to take + ? ?the backup, instead of manually copying the files. This tool will take + ? ?care of the functionpg_start_backup()/, copy and + ? ?functionpg_stop_backup()/ steps automatically, and transfers the + ? ?backup over a regular productnamePostgreSQL/productname connection + ? ?using the replication protocol, instead of requiring filesystem level + ? ?access. so I thought, well it does pg_start_backup and pg_stop_backup, and also creates the data directory. Yeah, but pg_start_backup() and pg_stop_backup() are server functions, and this is an application. Also, it won't actually work unless the server has replication configured (wal_level!=minimal, max_wal_senders0, and possibly some setting for wal_keep_segments), which has been the main point of the naming discussion thus far. Now, you know what would be REALLY cool? Making this work without any special advance configuration. Like if we somehow figured out a way to make max_wal_senders unnecessary, and a way to change wal_level without bouncing the server, so that we could temporarily boost the WAL level from minimal to archive if someone's running a backup. That, however, is not going to happen for 9.1... but it would be *really* cool. Well, when we originally implemented PITR, we could have found a way to avoid using SQL commands to start/stop backup, but we envisioned that we would want to hook things on to those commands so we created a stable API that we could improve, and we have. Yeah, we're certianly not taking those *away*. Do we envision pg_basebackup as something we will enahance, and if so, should we consider a generic name? Well, it's certainly going to be enhanced. I think there are two main uses for it - backups, and setting up replication slaves. I can't see it expanding beyond those, really. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost sfr...@snowman.net wrote: This patch looked good, in general, to me. I added a few documentation updates and a comment, but it's a very straight-forward patch as far as I can tell. Passes all regressions and my additional testing. I have not looked at the code for this patch at all as yet, but just as a general user comment - I really, really want this. It's one of about, uh, two pieces of information that the EXPLAIN output doesn't give you that is incredibly important for troubleshooting. What's the other one? In the following sort of plan: rhaas=# explain analyze select * from bob b, sally s where b.a = s.a; QUERY PLAN --- Nested Loop (cost=0.00..117890.00 rows=1000 width=8) (actual time=0.036..533.372 rows=1000 loops=1) - Seq Scan on sally s (cost=0.00..5770.00 rows=40 width=4) (actual time=0.014..46.469 rows=40 loops=1) - Index Scan using bob_pkey on bob b (cost=0.00..0.27 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=40) Index Cond: (a = s.a) Total runtime: 533.935 ms (5 rows) ...you cannot really tell how many rows the index scan was expected to match, or actually did match. The answer to the latter question certainly isn't 0. We previously discussed making the rows= line go out to three decimal places when used in an inner-index-scan context, which would help a lot - you could then multiply rows by loops to get an approximate answer. My preferred fix would be just to remove the unhelpful division-by-nloops code that gets applied in this case, but that's less backward-compatible. The main problem I've got with this patch is that there's no place to shoehorn the information into the textual EXPLAIN format without breaking a lot of expectations (and hence code --- it's insane to imagine that any significant amount of client-side code has been rewritten to make use of xml/json output yet). It would be nice to know what other requests are likely to be coming down the pike before we decide exactly how we're going to break things. It's hard to predict the nature of future feature requests, but this and the above are at the top of my list of ongoing gripes, and there isn't a close third. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
* Tom Lane (t...@sss.pgh.pa.us) wrote: The main problem I've got with this patch is that there's no place to shoehorn the information into the textual EXPLAIN format without breaking a lot of expectations (and hence code --- it's insane to imagine that any significant amount of client-side code has been rewritten to make use of xml/json output yet). It would be nice to know what other requests are likely to be coming down the pike before we decide exactly how we're going to break things. While I agree completely about the general if you're going to break, break it big approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. This is going into a new major version, not something which is being back-patched, and users now have a way in a released version to get away from relying on the string output. Have we worried about adding new plan nodes due to breakage in the explain output..? It strikes me that we've actually changed it with some regularity, in one aspect or another, over a couple of releases. Maybe my memory is bad though. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas robertmh...@gmail.com writes: Also, it won't actually work unless the server has replication configured (wal_level!=minimal, max_wal_senders0, and possibly some setting for wal_keep_segments), which has been the main point of the naming discussion thus far. Now, you know what would be REALLY cool? Making this work without any special advance configuration. Like if we somehow figured out a way to make max_wal_senders unnecessary, and a way to change wal_level without bouncing the server, so that we could temporarily boost the WAL level from minimal to archive if someone's running a backup. Not using max_wal_senders we're on our way, you just have to use the external walreceiver that Magnus the code for already. WAL level, I don't know that we have that already, but a big part of what this base backup tool is useful for is preparing a standby… so certainly you want to change that setup there. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: The main problem I've got with this patch is that there's no place to shoehorn the information into the textual EXPLAIN format without breaking a lot of expectations (and hence code --- it's insane to imagine that any significant amount of client-side code has been rewritten to make use of xml/json output yet). It would be nice to know what other requests are likely to be coming down the pike before we decide exactly how we're going to break things. While I agree completely about the general if you're going to break, break it big approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. This is going into a new major version, not something which is being back-patched, and users now have a way in a released version to get away from relying on the string output. I agree; we make bigger changes than this all the time. At the risk of putting words in Tom's mouth, I think part of the concern here may be that the EXPLAIN output is quite verbose already, and adding a few more details is going to make it harder to read in the cases where you *don't* care about this additional information. That's a valid concern, but I don't know what to do about it - not having this information available isn't better. It's tempting to propose moving the actual numbers down to the next line, so that the lines aren't so ridiculously long. Looking at the patch, I have to say I had hoped this was going to show nfiltered in both the estimated and actual cases, which it doesn't. Now maybe that's more work than we want to put in, but it would be nice to have. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest wrap-up
On Tue, Dec 21, 2010 at 10:49 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: - Writeable CTEs - I think we need Tom to pick this one up. - Fix snapshot taking inconsistencies - Ready for committer. Can any committer pick this up? Will take a look at these two also. Tom, what is your time frame on this? I think we should wrap up the CF without these and bundle 9.1alpha3 unless you plan to get to this in the next day or two. We probably shouldn't hold up the alpha for these, if there are no other items outstanding. OK. I've moved them to the next CommitFest and marked this one closed. Tom, are you still planning to pick these two up? They've been basically collecting dust for over two months now, or in one case three months, and we're running out of time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On 2011-01-20 7:07 PM +0200, Robert Haas wrote: Looking at the patch, I have to say I had hoped this was going to show nfiltered in both the estimated and actual cases, which it doesn't. Now maybe that's more work than we want to put in, but it would be nice to have. That would be fantastical, if we can make it happen. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote: While I agree completely about the general if you're going to break, break it big approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. I agree; we make bigger changes than this all the time. No, we don't. It's true that a client that wants to truly *understand* the plan has to know a lot of things, but the fundamental format of EXPLAIN ANALYZE output has been real stable for a real long time: node name (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx) detail line: something or other - subnode name ... more of the same ... This level of understanding seems plenty sufficient for something like explain.depesz.com, to name just one popular tool. The last format change of any kind we made in this skeleton was to increase the number of decimal places in the actual time numbers from 2 to 3 (wow). That was in 7.4. Modulo that detail, this basic contract has been valid since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch will break it. It might be interesting for somebody to go look at Hubert's code and see just how much it really knows about the EXPLAIN output format, and how much it's had to change across PG releases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest wrap-up
Robert Haas robertmh...@gmail.com writes: On Tue, Dec 21, 2010 at 10:49 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: - Writeable CTEs - I think we need Tom to pick this one up. - Fix snapshot taking inconsistencies - Ready for committer. Can any committer pick this up? Tom, are you still planning to pick these two up? They've been basically collecting dust for over two months now, or in one case three months, and we're running out of time. Yes, I will get to them. I haven't yet put my head down into full commit fest mode... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 11:59 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: Also, it won't actually work unless the server has replication configured (wal_level!=minimal, max_wal_senders0, and possibly some setting for wal_keep_segments), which has been the main point of the naming discussion thus far. Now, you know what would be REALLY cool? Making this work without any special advance configuration. Like if we somehow figured out a way to make max_wal_senders unnecessary, and a way to change wal_level without bouncing the server, so that we could temporarily boost the WAL level from minimal to archive if someone's running a backup. Not using max_wal_senders we're on our way, you just have to use the external walreceiver that Magnus the code for already. WAL level, I don't know that we have that already, but a big part of what this base backup tool is useful for is preparing a standby… so certainly you want to change that setup there. Well, yeah, but it would be nice to also use it just to take a regular old backup on a system that doesn't otherwise need replication. I think that the basic problem with wal_level is that to increase it you need to somehow ensure that all the backends have the new setting, and then checkpoint. Right now, the backends get the value through the GUC machinery, and so there's no particular bound on how long it could take for them to pick up the new value. I think if we could find some way of making sure that the backends got the new value in a reasonably timely fashion, we'd be pretty close to being able to do this. But it's hard to see how to do that. I had some vague idea of creating a mechanism for broadcasting critical parameter changes. You'd make a structure in shared memory containing the canonical values of wal_level and all other critical variables, and the structure would also contain a 64-bit counter. Whenever you want to make a parameter change, you lock the structure, make your change, bump the counter, and release the lock. Then, there's a second structure, also in shared memory, where backends report the value that the counter had the last time they updated their local copies of the structure from the shared structure. You can watch that to find out when everyone's guaranteed to have the new value. If someone doesn't respond quickly enough, you could send them a signal to get them moving. What would really be ideal is if you could actually make this safe enough that the interrupt service routine could do all the work, rather than just setting a flag. Or maybe CHECK_FOR_INTERRUPTS(). If you can't make it safe enough to put it in someplace pretty low-level like that, the whole idea might fall apart, because it wouldn't be useful to have a way of doing this that mostly works except sometimes it just sits there and hangs for a really long time. All pie in the sky at this point... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 12:57 PM, Magnus Hagander mag...@hagander.net wrote: On Jan 20, 2011 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote: While I agree completely about the general if you're going to break, break it big approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. I agree; we make bigger changes than this all the time. No, we don't. It's true that a client that wants to truly *understand* the plan has to know a lot of things, but the fundamental format of EXPLAIN ANALYZE output has been real stable for a real long time: node name (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx) detail line: something or other - subnode name ... more of the same ... This level of understanding seems plenty sufficient for something like explain.depesz.com, to name just one popular tool. The last format change of any kind we made in this skeleton was to increase the number of decimal places in the actual time numbers from 2 to 3 (wow). That was in 7.4. Modulo that detail, this basic contract has been valid since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch will break it. It might be interesting for somebody to go look at Hubert's code and see just how much it really knows about the EXPLAIN output format, and how much it's had to change across PG releases. Haven't looked at what changes with this patch, but dont forget PgAdmin that also parses the output. Though if the format changes enough to affect it, that might be the driving force to have it use xml format instead, which is the one that is intended for machine parsing after all.. How much has that code been updated from one release to the next? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Jan 20, 2011 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote: While I agree completely about the general if you're going to break, break it big approach, but I don't particularly care for holding output strings from EXPLAIN to the same level that we do the wireline protocol. I agree; we make bigger changes than this all the time. No, we don't. It's true that a client that wants to truly *understand* the plan has to know a lot of things, but the fundamental format of EXPLAIN ANALYZE output has been real stable for a real long time: node name (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual time=xxx.xxx..xxx.xxx rows=xxx loops=xxx) detail line: something or other - subnode name ... more of the same ... This level of understanding seems plenty sufficient for something like explain.depesz.com, to name just one popular tool. The last format change of any kind we made in this skeleton was to increase the number of decimal places in the actual time numbers from 2 to 3 (wow). That was in 7.4. Modulo that detail, this basic contract has been valid since EXPLAIN ANALYZE was invented, in 7.2. As proposed, this patch will break it. It might be interesting for somebody to go look at Hubert's code and see just how much it really knows about the EXPLAIN output format, and how much it's had to change across PG releases. Haven't looked at what changes with this patch, but dont forget PgAdmin that also parses the output. Though if the format changes enough to affect it, that might be the driving force to have it use xml format instead, which is the one that is intended for machine parsing after all.. /Magnus
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, Jan 20, 2011 at 10:07:23AM +, Simon Riggs wrote: On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: First, I'd like to note that the thread for this patch had *four* me-too responses to the use case. That's extremely unusual; the subject is definitely compelling to people. It addresses the bad behavior of natural attempts to atomically swap two tables in the namespace: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock # Do it this way, and the next SELECT gets data from the old table. #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' # Do it this way, and get: ERROR: could not open relation with OID 41380 psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'. psql -c 'DROP TABLE IF EXISTS t, old_t, new_t' by letting you do this instead: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock psql -c 'EXCHANGE TABLE new_t TO t psql -c 'SELECT * FROM t' # I get 'new', finally! psql -c 'DROP TABLE IF EXISTS t, new_t' I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread interesting: can we just make the first example work? Even granting that the second syntax may be a useful addition, the existing behavior of the first example is surely worthless, even actively harmful. I tossed together a proof-of-concept patch, attached, that makes the first example DTRT. Do you see any value in going down that road? As I said previously on the thread you quote, having this happen implicitly is not a good thing, and IMHO, definitely not the right thing. When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right Thing for that query to wake up and proceed based on the complete, final state of that committed DDL. Aside from the waiting itself, the query should behave as though it started after the DDL completed. In my example, the SELECT silently reads data from a table named old_t. What if that were an INSERT? The data falls in the wrong table. Heikki's suggestion, and your patch, contain no checking to see whether the old and new tables are similar. If they are not similar then we have all the same problems raised by my patch. SQL will suddenly fail because columns have ceased to exist, FKs suddenly disappear etc.. Indeed, Heikki's suggestion and my patch would not do such verification. I can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP ...; CREATE ...; than we allow today. Those need to stay open-ended, with the user responsible for choosing well. So, what's the right concurrent behavior around use of those statements? I answer that above. That said, I see utility in a feature that compares two tables, swaps them if similar, and fixes up foreign keys. Having such a feature does not justify wrong concurrent behavior around ALTER TABLE RENAME. Having right concurrent behavior around ALTER TABLE RENAME does not remove the utility of this feature. We should do both. I don't see how having a patch helps at all. I didn't think it was the right way before you wrote it and I still disagree now you've written it. Perhaps it helped me more than anyone else, and I should have kept it to myself. Heikki's suggestion seemed straightforward, so much so that I couldn't figure why nobody had done it. That would usually mean I'm missing something. So, I implemented it in a effort to discover what I had missed, failing to do so. Then, I sent it with the review in case you might spot what I had missed. Failure to add some kind of table similarity check was intentional, per above. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
* Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: I agree; we make bigger changes than this all the time. No, we don't. Alright, do we want to go down the road of adding new things to the XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT version, to avoid this concern? Stephen signature.asc Description: Digital signature
Re: [HACKERS] Moving test_fsync to /contrib?
On 1/20/11 6:15 AM, Robert Haas wrote: On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote: OK, I am ready to move test_fsync to /contrib. Is pg_test_fsync the best name? pg_check_fsync? pg_fsync_performance? pg_verify_fsync? I don't see too much reason to rename it more than necessary, so how about pg_test_fsync? +1. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas robertmh...@gmail.com writes: I think that the basic problem with wal_level is that to increase it you need to somehow ensure that all the backends have the new setting, and then checkpoint. Right now, the backends get the value through the GUC machinery, and so there's no particular bound on how long it could take for them to pick up the new value. I think if we could find some way of making sure that the backends got the new value in a reasonably timely fashion, we'd be pretty close to being able to do this. But it's hard to see how to do that. Well, you just said when to force the reload to take effect: at checkpoint time. IIRC we already multiplex SIGUSR1, is that possible to add that behavior here? And signal every backend at checkpoint time when wal_level has changed? I had some vague idea of creating a mechanism for broadcasting critical parameter changes. You'd make a structure in shared memory containing the canonical values of wal_level and all other critical variables, and the structure would also contain a 64-bit counter. Whenever you want to make a parameter change, you lock the structure, make your change, bump the counter, and release the lock. Then, there's a second structure, also in shared memory, where backends report the value that the counter had the last time they updated their local copies of the structure from the shared structure. You can watch that to find out when everyone's guaranteed to have the new value. If someone doesn't respond quickly enough, you could send them a signal to get them moving. What would really be ideal is if you could actually make this safe enough that the interrupt service routine could do all the work, rather than just setting a flag. Or maybe CHECK_FOR_INTERRUPTS(). If you can't make it safe enough to put it in someplace pretty low-level like that, the whole idea might fall apart, because it wouldn't be useful to have a way of doing this that mostly works except sometimes it just sits there and hangs for a really long time. All pie in the sky at this point... Unless we manage to simplify enough the idea to have wal_level SIGHUP. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints
On Thu, Jan 20, 2011 at 09:26:29AM -0500, Robert Haas wrote: My main beef with the Boolean flags is that this kind of thing is not too clear: reindex_relation(myrel, false, false, true, true, false, true, false, false, true); Unless you have an excellent memory, you can't tell what the heck that's doing without flipping back and forth between the function definition and the call site. With a bit-field, it's a lot easier to glance at the call site and have a clue what's going on. We're of course not quite to the point of that exaggerated example yet. Agreed. However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS and REINDEX_ALLOW_OLD_INDEX_USE. ?Then, flags = 0 can hurt performance but not correctness. ?That's looking like a win. I prefer the positive sense for those flags because I think it's more clear. There aren't so many call sites or so many people using this that we have to worry about what people are going to do in new calling locations; getting it right in any new code shouldn't be a consideration. Okay. I've attached a new patch version based on that strategy. diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1c9df98..75e7055 100644 *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *** *** 2533,2558 reindex_index(Oid indexId, bool skip_constraint_checks) * reindex_relation - This routine is used to recreate all indexes * of a relation (and optionally its toast relation too, if any). * ! * If heap_rebuilt is true, then the relation was just completely rebuilt by ! * an operation such as VACUUM FULL or CLUSTER, and therefore its indexes are ! * inconsistent with it. This makes things tricky if the relation is a system ! * catalog that we might consult during the reindexing. To deal with that ! * case, we mark all of the indexes as pending rebuild so that they won't be ! * trusted until rebuilt. The caller is required to call us *without* having ! * made the rebuilt versions visible by doing CommandCounterIncrement; we'll ! * do CCI after having collected the index list. (This way we can still use ! * catalog indexes while collecting the list.) * ! * We also skip rechecking uniqueness/exclusion constraint properties if ! * heap_rebuilt is true. This avoids likely deadlock conditions when doing ! * VACUUM FULL or CLUSTER on system catalogs. REINDEX should be used to ! * rebuild an index if constraint inconsistency is suspected. * * Returns true if any indexes were rebuilt. Note that a * CommandCounterIncrement will occur after each index rebuild. */ bool ! reindex_relation(Oid relid, bool toast_too, bool heap_rebuilt) { Relationrel; Oid toast_relid; --- 2533,2561 * reindex_relation - This routine is used to recreate all indexes * of a relation (and optionally its toast relation too, if any). * ! * flags can include REINDEX_SUPPRESS_INDEX_USE and REINDEX_CHECK_CONSTRAINTS. * ! * If flags has REINDEX_SUPPRESS_INDEX_USE, the relation was just completely ! * rebuilt by an operation such as VACUUM FULL or CLUSTER, and therefore its ! * indexes are inconsistent with it. This makes things tricky if the relation ! * is a system catalog that we might consult during the reindexing. To deal ! * with that case, we mark all of the indexes as pending rebuild so that they ! * won't be trusted until rebuilt. The caller is required to call us *without* ! * having made the rebuilt versions visible by doing CommandCounterIncrement; ! * we'll do CCI after having collected the index list. (This way we can still ! * use catalog indexes while collecting the list.) ! * ! * To avoid deadlocks, VACUUM FULL or CLUSTER on a system catalog must omit the ! * REINDEX_CHECK_CONSTRAINTS flag. REINDEX should be used to rebuild an index ! * if constraint inconsistency is suspected. For optimal performance, other ! * callers should include the flag only after transforming the data in a manner ! * that risks a change in constraint validity. * * Returns true if any indexes were rebuilt. Note that a * CommandCounterIncrement will occur after each index rebuild. */ bool ! reindex_relation(Oid relid, bool toast_too, int flags) { Relationrel; Oid toast_relid; *** *** 2608,2614 reindex_relation(Oid relid, bool toast_too, bool heap_rebuilt) List *doneIndexes; ListCell *indexId; ! if (heap_rebuilt) { /* Suppress use of all the indexes until they are rebuilt */ SetReindexPending(indexIds); --- 2611,2617 List *doneIndexes; ListCell *indexId; ! if (flags REINDEX_SUPPRESS_INDEX_USE) { /* Suppress
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
* Robert Haas (robertmh...@gmail.com) wrote: How much has that code been updated from one release to the next? Just an FYI, I talked to depesz on IRC (please chime in if you disagree with any of this) and he indicated that he's had to update the code from time to time, mostly because the parser was too strict. He also mentioned that he didn't feel it was terribly complicated or that it'd be difficult to update for this. Looking over the code, it's got a simple regex for matching that line which would have to be updated, but I don't think it'd require much more than that. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 1:47 PM, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: I agree; we make bigger changes than this all the time. No, we don't. Alright, do we want to go down the road of adding new things to the XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT version, to avoid this concern? No, because, for one thing, the text output is what people are going to send me when they want me to fix their crap. If the information isn't there, I lose. And no, I don't want them to send me the XML. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: I think that the basic problem with wal_level is that to increase it you need to somehow ensure that all the backends have the new setting, and then checkpoint. Right now, the backends get the value through the GUC machinery, and so there's no particular bound on how long it could take for them to pick up the new value. I think if we could find some way of making sure that the backends got the new value in a reasonably timely fashion, we'd be pretty close to being able to do this. But it's hard to see how to do that. Well, you just said when to force the reload to take effect: at checkpoint time. IIRC we already multiplex SIGUSR1, is that possible to add that behavior here? And signal every backend at checkpoint time when wal_level has changed? Sending them a signal seems like a promising approach, but the trick is guaranteeing that they've actually acted on it before you start the checkpoint. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: How much has that code been updated from one release to the next? Just an FYI, I talked to depesz on IRC (please chime in if you disagree with any of this) and he indicated that he's had to update the code from time to time, mostly because the parser was too strict. He also mentioned that he didn't feel it was terribly complicated or that it'd be difficult to update for this. Looking over the code, it's got a simple regex for matching that line which would have to be updated, but I don't think it'd require much more than that. i'll be happy to update the Pg::Explain to handle new elements of textual plans, so if this would be of concern - please don't treat compatibility with explain.depesz.com as your responsibility/problem. I'll fix the parser (have to add json/xml parsing too anyway), and I, too, would love to get more information. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas robertmh...@gmail.com writes: Sending them a signal seems like a promising approach, but the trick is guaranteeing that they've actually acted on it before you start the checkpoint. How much using a latch here would help? Or be overkill? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On 20.01.2011 22:15, Dimitri Fontaine wrote: Robert Haasrobertmh...@gmail.com writes: Sending them a signal seems like a promising approach, but the trick is guaranteeing that they've actually acted on it before you start the checkpoint. How much using a latch here would help? Or be overkill? A latch doesn't give you an acknowledgment from the backends that they've received and acted on the guc change. You could use it as a building block to construct that, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exceptions not present in plpy with Python 3
On lör, 2010-12-18 at 18:56 +0100, Jan Urbański wrote: there seems to be a problem in the way we add exceptions to the plpy module in PL/Python compiled with Python 3k. Try this: DO $$ plpy.SPIError $$ language plpython3u; I'm not a Python 3 expert, but I nicked some code from the Internet and came up with this patch (passes regression tests on both Python 2 and 3). It looks like the PyModule_AddObject() approach also works in Python 2. Anyone see an issue with using that uniformly? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On ons, 2011-01-19 at 10:06 +0900, Hitoshi Harada wrote: - This is not in the patch, but around line 184 vis versa in comment seems like typo. Fixed. - A line break should be added before PLy_add_exception() after static void I'll add that when I get to the patch. - This is also not in the patch, but the comment /* these should only be called once at the first call * of plpython_call_handler. initialize the python interpreter * and global data. */ is bogus. PLy_init_interp() is called in _PG_init(). Fixed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
hubert depesz lubaczewski dep...@depesz.com writes: On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote: He also mentioned that he didn't feel it was terribly complicated or that it'd be difficult to update for this. Looking over the code, it's got a simple regex for matching that line which would have to be updated, but I don't think it'd require much more than that. i'll be happy to update the Pg::Explain to handle new elements of textual plans, so if this would be of concern - please don't treat compatibility with explain.depesz.com as your responsibility/problem. The point isn't whether it'd be terribly difficult to update client side EXPLAIN-parsing code ... it's whether we should break it in the first place. I don't find the proposed format so remarkably well-designed that it's worth creating compatibility problems for. The main functional problem I see with this format is that it assumes there is one and only one filter step associated with every plan node. That is just plain wrong. Many don't have any, and there are important cases where there are two. I'm thinking in particular that it might be useful to distinguish the effects of the recheck and the filter conditions of a bitmap heap scan. Maybe it'd also be interesting to separate the join and non-join filter clauses of a join node, though I'm less sure about the usefulness of that. So the line I'm thinking we should pursue is to visually associate the new counter with the filter condition, either like Filter Cond: (x 42) (nfiltered = 123) or Filter Cond: (x 42) Rows Filtered: 123 The latter is less ambiguous, but takes more vertical space. The former is very unlikely to break any client code, because I doubt there is any that inquires into the details of what a filter condition expression really means. The latter *might* break code depending on how much it assumes about the number of detail lines attached to a plan node ... but as Robert pointed out, we've added new detail lines before. BTW, is it just me, or is the terminology number filtered pretty confusing/ambiguous in itself? It doesn't seem at all clear to me whether that's the number of rows passed by the filter condition or the number of rows rejected. Perhaps nremoved would be clearer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: The main functional problem I see with this format is that it assumes there is one and only one filter step associated with every plan node. That is just plain wrong. Many don't have any, and there are important cases where there are two. I'm thinking in particular that it might be useful to distinguish the effects of the recheck and the filter conditions of a bitmap heap scan. If it's not too hard to do that, I'm all in favor. Maybe it'd also be interesting to separate the join and non-join filter clauses of a join node, though I'm less sure about the usefulness of that. That would also be extremely useful. So the line I'm thinking we should pursue is to visually associate the new counter with the filter condition, either like Filter Cond: (x 42) (nfiltered = 123) or Filter Cond: (x 42) Rows Filtered: 123 The latter is less ambiguous, but takes more vertical space. The former is very unlikely to break any client code, because I doubt there is any that inquires into the details of what a filter condition expression really means. The latter *might* break code depending on how much it assumes about the number of detail lines attached to a plan node ... but as Robert pointed out, we've added new detail lines before. I like the idea of putting it on the same line as the filter condition, but your proposal for how to do that doesn't wow me - the parentheses look too similar to the ones around the qual itself. BTW, is it just me, or is the terminology number filtered pretty confusing/ambiguous in itself? It doesn't seem at all clear to me whether that's the number of rows passed by the filter condition or the number of rows rejected. Perhaps nremoved would be clearer. I think filtered is pretty clear and like it... removed sounds like you deleted something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
Robert Haas robertmh...@gmail.com wrote: I think filtered is pretty clear and like it... I find it ambiguous. [Takes sip of filtered water.] How about excluded? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Orphaned statements issue
Hackers, One of our clients is seeing an unusual issue with statements which are waiting going into sleep forever and never completing. This first e-mail is for a has anyone else seen this problem before? while we try to collect additional information for a diagnosis. This problem only happens under load and only when performing load tests that insert large data (250 Kib) into bytea columns. It's takes a couple hours but we've been able to reproduce the issue with a 100% success rate. Sometime the locked query shows up as 'BIND' in the pg_stat_activity sometimes it's a complicated query using a multi-table left outer join, other times is a simple select. The only thing in common is that there is never a corresponding entry for that statement in the pg_locks table, and if you drop the connection the query goes away. An strace on the process shows it to be in RECV, and otherwise doing nothing. We have not been able to run GDB because it takes a couple hours of running a heavy load test to cause the issue. While memory is heavily used during the test, there is no swapping during the test which would indicate Linux memory management as the culprit. We can reproduce the issue on 8.1.11 and 8.1.23. Currently we are working on testing it on 9.0 and seeing if we can reproduce the issue. We compiled the postgres from the Redhat source RPM. The only modification that we make is the config file. The OS is Centos 5.4 32bit. Hardware: IBM 3650 2 x Dual Core Intel Xeon 5160 @ 3.00 GHz 16 GB memory 6 x 146 GB SAS 10K RPM in RAID-5 Please note that while we can reproduce the issue, access to the test system is fairly restrictive and test runs take a while, so I'd like to get requests for additional information-collecting all at once if possible. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: I think that the basic problem with wal_level is that to increase it you need to somehow ensure that all the backends have the new setting, and then checkpoint. Well, you just said when to force the reload to take effect: at checkpoint time. IIRC we already multiplex SIGUSR1, is that possible to add that behavior here? And signal every backend at checkpoint time when wal_level has changed? Sending them a signal seems like a promising approach, but the trick is guaranteeing that they've actually acted on it before you start the checkpoint. Have the backends show their current wal_level in their PGPROC entries. Sleep till they're all reporting the right thing, then fire checkpoint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
Excerpts from Josh Berkus's message of jue ene 20 18:05:15 -0300 2011: One of our clients is seeing an unusual issue with statements which are waiting going into sleep forever and never completing. This first e-mail is for a has anyone else seen this problem before? while we try to collect additional information for a diagnosis. I have seen it -- on 8.1 too. On our case it was caused by an insert that was doing lots of toast insertions, so it needed to grab the extension lock frequently for the toast table; and this was slowed down by a largish shared_buffers setting, somehow (8.1 doesn't have lock partitioning, so this was expensive). I don't recall details on why these were related. If this is your case too, I doubt you'd be able to reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was introduced). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
Noah Misch n...@leadboat.com writes: Heikki's suggestion seemed straightforward, so much so that I couldn't figure why nobody had done it. That would usually mean I'm missing something. If you're willing to substitute an incompatible table, it's not clear why you don't just do begin; drop table t; alter table t_new rename to t; commit; There are some implementation issues with this: concurrent accesses are likely to end up failing with relation with OID nnn doesn't exist, because backends translate the table's name to OID before acquiring lock. But you'd have to solve those issues anyway to make an ALTER REPLACE WITH work as transparently as you seem to hope it would. Unless the idea here is to also have t_new acquire t's OID, and that is an absolute complete won't-happen if you're not enforcing a pretty thorough level of compatibility between the two tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 20.1.2011 03:06, Nathan Boley napsal(a): And actually it does not depend on ndistinct for the columns only, it depends on ndistinct estimates for the combination of columns. So improving the ndistinct estimates for columns is just a necessary first step (and only if it works reasonably well, we can do the next step). I think that any approach which depends on precise estimates of ndistinct is not practical. I'm not aware of any other approach to the 'discrete fail case' (where the multi-dimensional histograms are not applicable). If someone finds a better solution, I'll be the first one to throw away this stuff. I am very happy that you've spent so much time on this, and I'm sorry if my previous email came off as combative. My point was only that simple heuristics have served us well in the past and, before we go to the effort of new, complicated schemes, we should see how well similar heuristics work in the multiple column case. I am worried that if the initial plan is too complicated then nothing will happen and, even if something does happen, it will be tough to get it committed ( check the archives for cross column stat threads - there are a lot ). If I've leaned one thing over the years in IT, it's not to take critique personally. All the problems mentioned in this thread are valid concerns, pointing out weak points of the approach. And I'm quite happy to receive this feedback - that's why I started it. On the other hand - Jara Cimrman (a famous Czech fictional character, depicted as the best scientist/poet/teacher/traveller/... - see [1]) once said that you can't be really sure you don't get gold by blowing cigarette smoke into a basin drain, until you actually try it. So I'm blowing cigaretter smoke into the drain ... It may wery vell happen this will be a dead end, but I'll do my best to fix all the issues or to prove that the pros outweight the cons. And even if it will be eventually rejected, I hope to get -1 from TL to be eligible for that t-shirt ... [1] http://en.wikipedia.org/wiki/Jara_Cimrman regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, is it just me, or is the terminology number filtered pretty confusing/ambiguous in itself? It doesn't seem at all clear to me whether that's the number of rows passed by the filter condition or the number of rows rejected. Perhaps nremoved would be clearer. I think filtered is pretty clear and like it... removed sounds like you deleted something. Well, you did delete something, no? There are rows that aren't in the output that would have been there if not for the filter condition. And, btw, one person thinking it's clear doesn't make it so. There are actually three numbers that could be involved here: the number of rows arriving at the filter, the number passed by it, and the number rejected by it. I think that nfiltered could possibly mean any of those three. A non-native speaker of English would be even less likely to be sure of what was meant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
I have seen it -- on 8.1 too. On our case it was caused by an insert that was doing lots of toast insertions, so it needed to grab the extension lock frequently for the toast table; and this was slowed down by a largish shared_buffers setting, somehow (8.1 doesn't have lock partitioning, so this was expensive). I don't recall details on why these were related. If this is your case too, I doubt you'd be able to reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was introduced). Thanks, I'll bet that's the case. This is happening on machines with more RAM, so they've increased shared_buffers. Now, to get them off 8.1. Been trying for over a year now ... -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, 2011-01-20 at 13:14 -0500, Noah Misch wrote: When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right Thing for that query to wake up and proceed based on the complete, final state of that committed DDL. Aside from the waiting itself, the query should behave as though it started after the DDL completed. In my example, the SELECT silently reads data from a table named old_t. What if that were an INSERT? The data falls in the wrong table. Heikki's suggestion, and your patch, contain no checking to see whether the old and new tables are similar. If they are not similar then we have all the same problems raised by my patch. SQL will suddenly fail because columns have ceased to exist, FKs suddenly disappear etc.. Indeed, Heikki's suggestion and my patch would not do such verification. I can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP ...; CREATE ...; than we allow today. Those need to stay open-ended, with the user responsible for choosing well. So, what's the right concurrent behavior around use of those statements? I answer that above. That said, I see utility in a feature that compares two tables, swaps them if similar, and fixes up foreign keys. Having such a feature does not justify wrong concurrent behavior around ALTER TABLE RENAME. Having right concurrent behavior around ALTER TABLE RENAME does not remove the utility of this feature. We should do both. I agree that the DDL behaviour is wrong and should be fixed. Thank you for championing that alternative view. Swapping based upon names only works and is very flexible, much more so than EXCHANGE could be. A separate utility might be worth it, but the feature set of that should be defined in terms of correctly-working DDL behaviour. It's possible that no further requirement exists. I remove my own patch from consideration for this release. I'll review your patch and commit it, problems or objections excepted. I haven't looked at it in any detail. Having said that, writing the patch did nothing to convince me this was the correct approach. Reviews should be reviews, they are not an opportunity to provide your own alternate version of a patch. That just confuses things and creates a competitive, not a cooperative environment. Authors do need to listen to reviewers, so I hope I'm demonstrating that here. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
Josh Berkus j...@agliodbs.com writes: One of our clients is seeing an unusual issue with statements which are waiting going into sleep forever and never completing. This first e-mail is for a has anyone else seen this problem before? while we try to collect additional information for a diagnosis. An strace on the process shows it to be in RECV, and otherwise doing nothing. I would take that to mean that it's waiting on the client. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
After playing with this in benchmarks and researching the weird results I got I'm going to advise dropping the todo for now unless something happens to change how postgres handles clustering. You guys probably already grokked this so I am just recording it for the list archives. The primary factor here is that postgres doesn't maintain clustered indexes. Clustering is a one-time operation that clusters the table at this current point in time. Basically, there really isn't any such thing in postgres as a clustered index. There is an operation - Cluster - which takes an index and a table as input and re-orders the table according to the index. But it is borderline fiction to call the index used clustered because the next row inserted will pop in at the end of the table instead of slipping into the middle of the table per the desired ordering. All the pg_table cluster candidates are candidates because they have a row per table column and we expect that a query will want to get several of these rows at once. These rows are naturally clustered because the scripts that create them insert their information into the catalog contiguously. When you create a catalog table the pg_attribute rows for its columns are inserted together. When you then create all its triggers they too are put into pg_triggers one after the other. So calling the Cluster operation after dbinit doesn't help anything. Over time table alterations can fragment this information. If a user loads a bunch of tables, then alters them over time the columns added later on will have their metadata stored separately from the columns created originally. Which gets us to the down and dirty of how the Cluster function works. It puts an access exclusive lock on the entire table - blocking all attempts to read and write to the table - creates a copy of the table in the desired order, drops the original, and renames the copy. Doing this to a catalog table that is relevant to queries pretty much brings everything else in the database to a halt while the system table is locked up. And the brute force logic makes this time consuming even if the table is perfectly ordered already. Additionally, snapshots taken of the table during the Cluster operation make the table appear to be empty which introduces the possibility of system table corruption if transactions are run concurrently with a Cluster operation. So basically, the Cluster operation in its current form is not something you want running automatically on a bunch of system table as it is currently implemented. It gives your system the hiccups. You would only want to run it manually during downtime. And you can do that just as easily with or without any preparation during dbinit. Thanks everyone, -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 20.1.2011 03:36, Robert Haas napsal(a): On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra t...@fuzzy.cz wrote: Regarding the crash scenario - if the commit fails, just throw away the local estimator copy, it's not needed. I'm not sure how to take care of the case when commit succeeds and the write of the merged estimator fails, but I think it might be possible to write the estimator to xlog or something like that. So it would be replayed during recovery etc. Or is it a stupid idea? It's not stupid, in the sense that that is what you'd need to do if you want to avoid ever having to rescan the table. But it is another thing that I think is going to be way too expensive. Way too expensive? All you need to put into the logfile is a copy of the estimator, which is a few kBs. How is that 'way too expensive'? At this point, this is all a matter of religion, right? Neither of us has a working implementation we've benchmarked. But yes, I believe you're going to find that implementing some kind of streaming estimator is going to impose a... pulls number out of rear end 6% performance penalty, even after you've optimized the living daylights out of it. Now you might say... big deal, it improves my problem queries by 100x. OK, but if you could get the same benefit by doing an occasional full table scan during off hours, you could have the same performance with a *0%* performance penalty. Even better, the code changes would be confined to ANALYZE rather than spread out all over the system, which has positive implications for robustness and likelihood of commit. Good point. What I was trying to do was to continuously update the estimator with new data - that was the whole idea behind the collecting of new values (which might lead to problems with memory etc. as you've pointed out) and updating a local copy of the estimator (which is a good idea I think). But this might be another option - let the user decide if he wants to continuously update the estimates (and pay the price) or do that off the hours (and pay almost nothing). That sounds as a very good solution to me. I'm not trying to argue you out of working on this. It's obviously your time to spend, and if works better than I think it will, great! I'm merely offering you an opinion on what will probably happen if you go this route - namely, it'll carry an unpalatable run-time penalty. That opinion may be worth no more than what you paid for it, but there you have it. Yes, and I appreciate all feedback. But I still believe this can be done so that users that don't need the feature don't pay for it. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
I would take that to mean that it's waiting on the client. You mean that the client timed out and isn't accepting data from the query anymore? Shouldn't Postgres time out on that after a while? In one case, the orphaned statement was 16 hours old before we killed it. If it's relevant, the client connection is from a C application via libpq on localhost. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 20.1.2011 09:10, Heikki Linnakangas napsal(a): It seems that the suggested multi-column selectivity estimator would be more sensitive to ndistinct of the individual columns. Is that correct? How is it biased? If we routinely under-estimate ndistinct of individual columns, for example, does the bias accumulate or cancel itself in the multi-column estimate? I'd like to see some testing of the suggested selectivity estimator with the ndistinct estimates we have. Who knows, maybe it works fine in practice. The estimator for two columns and query 'A=a AND B=b' is about 0.5 * (dist(A)/dist(A,B) * Prob(A=a) + dist(B)/dist(A,B) * Prob(B=b)) so it's quite simple. It's not that sensitive to errors or ndistinct estimates for individual columns, but the problem is in the multi-column ndistinct estimates. It's very likely that with dependent colunms (e.g. with the ZIP codes / cities) the distribution is so pathological that the sampling-based estimate will be very off. I guess this was a way too short analysis, but if you can provide more details of the expected tests etc. I'll be happy to provide that. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 20.01.2011 03:05, Kevin Grittner wrote: If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? +1 for generating an error. Before I go do that, I want to be sure everyone is clear about the state of things. If SSI is used to provide data integrity on the master, it will prevent any serialization anomalies from being persisted on any hot standby *long term*. For example, at any point where the standby is at a point in the transaction stream where there were no read/write transaction active, no anomalies can be observed. (That isn't the *only* time; it's just the simplest one to describe as an example.) Queries on the standby can, however, see *transient* anomalies when they run queries which would cause a serialization failure if run on the master at the same point in the transaction stream. This can only occur when, of two concurrent transactions, the one which *appears* to run second because the other can't read what it wrote, *commits* first. The most common and alarming situation where this occurs, in my opinion, is batch processing. This is extremely common in financial applications, and tends to show up in a lot of other places, too. (The receipting query set is an instance of this type of problem, but I'm going to keep it more general in hopes that people can see where it impacts them.) Imagine an application which has some small control record in a table, and inserts to some other table are assigned to a batch based on the control record. The batches are normally identified by ascending dates or serial numbers. Periodically a new batch is opened and the old batch is closed by updating a current batch id column in the control table. If the batch ID is updated and the transaction in which that update was executed commits while a transaction which read the old batch ID is still in flight, a read of the database will show that the batch is closed, but if you look at the detail of the batch, it will not yet be complete. Under SSI, one of these transactions will be canceled to prevent this. Our implementation will always allow the update which closes the batch to complete, and either the insert or the select of the detail will be rolled back with a serialization failure, depending on the timing the actions inside those transactions. If the insert fails, it can be retried, and will land in the new batch -- making the list of the batch which omits it OK. If the listing of the batch details is canceled, it will be because the insert into the old batch committed before it recognized the problem, so an immediate retry of the select will see the complete batch contents. A hot standby can't really take part in the predicate locking and transaction cancellation on the master. Dan and I have both come to the conclusion that the only reasonable way to allow hot standby to work with SSI is for the WAL (when wal_level = hot_standby) to contain information about which snapshots develop which won't see such a state. In the above example, barring some throttling mechanism skipping these particular snapshots, or other problematic conflicts around the same time, the master would tell the standby that the snapshot before either of the two problem transactions was OK, and then it would tell them that the snapshot after both had committed was OK. It would not suggest using the snapshot available between the commit of the control record update and the commit of the insert into the batch. This seems to me to be not completely unrelated to the snapshot synchronization patch. It is clearly closely related to the READ ONLY DEFERRABLE mode, which also looks for a snapshot which is immune to serialization anomalies without predicate locking, conflict detection, transaction cancellation, etc. Melding these two things with hot standby seems to be beyond what can reasonably happen for 9.1 without delaying the release. If someone is using one feature and not the other, they really don't have a problem. Like anyone else, if a hot standby user has been using SERIALIZABLE mode under 9.0 or earlier, they will need to switch to REPEATABLE READ. A SERIALIZABLE user who doesn't set up hot standby has no issue. Opinions so far seem to be in favor of reporting an error on the standby if SERIALIZABLE is requested, so that people don't silently get less protection than they expect. The most annoying thing about that is that if the use would *like* to use truly serializable transactions on the standby, and will do so when they get it in 9.2, they must switch to REPEATABLE READ now, and switch back to SERIALIZABLE with the next release. So, based on a more complete description of the issues, any more opinions
Re: [HACKERS] estimating # of distinct values
Dne 20.1.2011 11:05, Csaba Nagy napsal(a): Hi Tomas, On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote: No, the multi-column statistics do not require constant updating. There are cases where a sampling is perfectly fine, although you may need a bit larger sample. Generally if you can use a multi-dimensional histogram, you don't need to scan the whole table. In the cases where sampling is enough, you can do that to the updates too: do a sampling on the changes, in that you only process every Nth change to make it to the estimator. If you can also dynamically tune the N to grow it as the statistics stabilize, and lower it if you detect high variance, even better. If the analyze process could be decoupled from the backends, and maybe just get the data passed over to be processed asynchronously, then that could be a feasible way to have always up to date statistics when the bottleneck is IO and CPU power is in excess. If that then leads to better plans, it could really be a win exceeding the overhead. OK, this sounds interesting. I'm not sure how to do that but it might be a good solution. What about transactions? If the client inserts data (and it will be sent asynchronously to update the estimator) and then rolls back, is the estimator 'rolled back' or what happens? This was exactly the reason why I initially wanted to collect all the data at the backend (and send them to the estimator at commit time). Which was then replaced by the idea to keep a local estimator copy and merge it back to the original estimator at commit time. If this analyze process (or more of them) could also just get the data from the modified buffers in a cyclic way, so that backends need nothing extra to do, then I don't see any performance disadvantage other than possible extra locking contention on the buffers and non-determinism of the actual time when a change makes it to the statistics. Then you just need to get more CPU power and higher memory bandwidth to pay for the accurate statistics. Well, the possible locking contention sounds like a quite significant problem to me :-( The lag between an update and a change to the stats is not that big deal I guess - we have the same behaviour with the rest of the stats (updated by autovacuum every once a while). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Kevin, So, based on a more complete description of the issues, any more opinions on whether to generate the error, as suggested by Heikki? If it's a choice between generating an error and letting users see inconsistent data, I'll take the former. Does anyone think this justifies the compatibility GUC as suggested by Jeff? I think it might, yes. Since someone could simply turn on the backwards compatibility flag for 9.1 and turn it off for 9.2, rather than trying to mess with transaction states which might be set in application code. Unfortunately, people have not responded to our survey :-( http://www.postgresql.org/community/survey.77 -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
Josh Berkus j...@agliodbs.com writes: I would take that to mean that it's waiting on the client. You mean that the client timed out and isn't accepting data from the query anymore? No, if the backend is in RECV state, it's waiting for the client to *send* it something. (Although if this is an SSL connection, it's a bit harder to be sure about what the logical state of the connection is.) Shouldn't Postgres time out on that after a while? Not if the problem is the client is confused. As long as the remote-end kernel doesn't indicate the connection is dead, we'll wait for the client to wake up and send us a command. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] One Role, Two Passwords
Hello list, I wanted to test the waters on how receptive people might be to an extension that would allow Postgres to support two passwords for a given role. I have recently encountered a case where this would be highly useful when performing rolling password upgrades across many client applications and/or application instances. It is possible (as far as I know) to get around some of the sticker parts of this with some teeth gnashing, using some CREATE ROLE ... IN ROLE dancing, but I wanted to see if there was any interest in supporting this for real. This design is not uncommon, one example is Amazon Web Services (e.g. EC2, S3), whereby one identification key can have many, independently revokable secret keys. I haven't given much thought to the mechanism yet, rather, I am just trying to assess gut reactions on the principle. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One Role, Two Passwords
On 01/20/2011 05:28 PM, Daniel Farina wrote: Hello list, I wanted to test the waters on how receptive people might be to an extension that would allow Postgres to support two passwords for a given role. I have recently encountered a case where this would be highly useful when performing rolling password upgrades across many client applications and/or application instances. It is possible (as far as I know) to get around some of the sticker parts of this with some teeth gnashing, using some CREATE ROLE ... IN ROLE dancing, but I wanted to see if there was any interest in supporting this for real. This design is not uncommon, one example is Amazon Web Services (e.g. EC2, S3), whereby one identification key can have many, independently revokable secret keys. I haven't given much thought to the mechanism yet, rather, I am just trying to assess gut reactions on the principle. Have you thought of trying to use an external auth source like LDAP for such a scheme? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: Heikki's suggestion seemed straightforward, so much so that I couldn't figure why nobody had done it. That would usually mean I'm missing something. If you're willing to substitute an incompatible table, it's not clear why you don't just do begin; drop table t; alter table t_new rename to t; commit; Because the whole source of this problem is dependency hell. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One Role, Two Passwords
On Thu, Jan 20, 2011 at 2:45 PM, Andrew Dunstan and...@dunslane.net wrote: Have you thought of trying to use an external auth source like LDAP for such a scheme? I have thought about that, although LDAP is the only one that came to mind (I don't know a whole lot of systems in detail, only by name...so suggestions welcome for low-administrative-overhead variants). I also briefly considered investigating what hooks I could exploit for auth auth; I do not know these very well right now. It would be ideal to not have to run another full bore set of services to support phased password rotation, though -- in this case it would still appear be better, but frustrating to use the CREATE ROLE ... IN ROLE dance. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Large object corruption during 'piped' pg_restore
Bosco Rama postg...@boscorama.com writes: If 'standard_conforming_strings = on' is set in our DB (which is required for our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql) results in the large objects being corrupted. All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 LTS with all current updates applied. I've been able to replicate this in 8.4; it doesn't happen in 9.0 (but probably does in all 8.x versions). The problem is that pg_dump (or in this case really pg_restore) is relying on libpq's PQescapeBytea() to format the bytea literal that will be given as argument to lowrite() during the restore. When pg_dump is producing SQL directly, or when pg_restore is connected to a database, PQescapeBytea() mooches the standard_conforming_strings value from the active libpq connection and gets the right answer. In the single case where pg_restore is producing SQL without ever opening a database connection, PQescapeBytea doesn't know what to do and defaults to the old non-standard-strings behavior. Unfortunately pg_restore set standard_conforming_strings=on earlier in the script (if it was set in the original source database) so you get the wrong thing. The bottom line is that pg_dump can't depend on libpq's PQescapeBytea, but needs its own copy. We have in fact done that as of 9.0, which is what I was vaguely remembering: Author: Tom Lane t...@sss.pgh.pa.us Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 + Fix pg_dump to do the right thing when escaping the contents of large objects. The previous implementation got it right in most cases but failed in one: if you pg_dump into an archive with standard_conforming_strings enabled, then pg_restore to a script file (not directly to a database), the script will set standard_conforming_strings = on but then emit large object data as nonstandardly-escaped strings. At the moment the code is made to emit hex-format bytea strings when dumping to a script file. We might want to change to old-style escaping for backwards compatibility, but that would be slower and bulkier. If we do, it's just a matter of reimplementing appendByteaLiteral(). This has been broken for a long time, but given the lack of field complaints I'm not going to worry about back-patching. I'm not sure whether this new complaint is enough reason to reconsider back-patching. We cannot just backport the 9.0 patch, since it assumes it can do bytea hex output --- we'd need to emit old style escaped output instead. So it's a bit of work, and more to the point would involve pushing poorly-tested code into stable branches. I doubt it would go wrong, but in the worst-case scenario we might create failures for blob-restore cases that work now. So I'm not sure whether to fix it, or leave it as a known failure case in old branches. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, Jan 20, 2011 at 09:36:11PM +, Simon Riggs wrote: I agree that the DDL behaviour is wrong and should be fixed. Thank you for championing that alternative view. Swapping based upon names only works and is very flexible, much more so than EXCHANGE could be. A separate utility might be worth it, but the feature set of that should be defined in terms of correctly-working DDL behaviour. It's possible that no further requirement exists. I remove my own patch from consideration for this release. I'll review your patch and commit it, problems or objections excepted. I haven't looked at it in any detail. Thanks. I wouldn't be very surprised if that patch is even the wrong way to achieve these semantics, but it's great that we're on the same page as to which semantics they are. Having said that, writing the patch did nothing to convince me this was the correct approach. Reviews should be reviews, they are not an opportunity to provide your own alternate version of a patch. That just confuses things and creates a competitive, not a cooperative environment. Authors do need to listen to reviewers, so I hope I'm demonstrating that here. Understood. I can see now that posting a second code patch, however framed, in the same thread creates a presumption of aggression that is difficult to dispel. I will have a lot to think about before doing that again. Thanks for giving this discussion, which started poorly due to my actions, a second chance. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: If you're willing to substitute an incompatible table, it's not clear why you don't just do begin; drop table t; alter table t_new rename to t; commit; Because the whole source of this problem is dependency hell. Well, if you want to preserve dependencies, you can *not* just blindly substitute an incompatible table. You must ensure that views and foreign keys referencing the table are still valid. So I'm not sure where anybody got the idea that an implementation that fails to check all that is even worth presenting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. Why not? We already generate appropriate snapshots for this in SSI, so is the problem in getting the appropriate information into the WAL stream or in having a request for a snapshot within a serializable transaction while running in hot standby the problem? It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. Because the standby would need to bombard the server with a stream of predicate lock information, we would need to allow transactions on the master to be canceled do in part to activity on the standby, and I don't even know how you would begin to track read/write conflicts between transactions on two different clusters. If any of that didn't make sense, it would probably be more efficient for everyone involved if those interested browsed the Overview section of the Wiki page than to have me duplicate its contents in a post. http://wiki.postgresql.org/wiki/Serializable -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On 1/20/2011 12:47 PM, Tom Lane wrote: So the line I'm thinking we should pursue is to visually associate the new counter with the filter condition, either like Filter Cond: (x 42) (nfiltered = 123) or Filter Cond: (x 42) Rows Filtered: 123 I'd prefer the latter. Sometimes the Filter Cond is very complex and finding the nfiltered information would be easier if it always had its own row. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Jan21, 2011, at 00:11 , Simon Riggs wrote: It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. The problem is that snapshots taken on the master sometimes represent a state of the database which cannot occur under any (valid) serial schedule. Hence, if you use that snapshot to read the *whole* database, you've surely violated serializability. If you read only parts of the database, things may or may not be fine, depending on the parts you read. To have the same stringent guarantees that SERIALIZABLE provides on the master also for queries run against the slave, you somehow need to prevent this. The easiest way is to only use snapshots on the slave which *cannot* produce such anomalies. We already know now to generate such snapshots - SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question is mainly how to transfer such snapshots to the slave, and how often we transmit a new one. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
I wrote: Why not? We already generate appropriate snapshots for this in SSI, so is the problem in getting the appropriate information into the WAL stream or in having a request for a snapshot within a serializable transaction while running in hot standby the problem? I dropped few words. That was supposed to ask whether the problem was in getting hot standby to *use such a snapshot*. I'm open to other suggestions on how else we might do this. I don't see any alternatives, but maybe you're seeing some possibility that eludes me. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Orphaned statements issue
On 1/20/11 2:26 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I would take that to mean that it's waiting on the client. You mean that the client timed out and isn't accepting data from the query anymore? No, if the backend is in RECV state, it's waiting for the client to *send* it something. I don't think that's consistent with what we're seeing except maybe in the BIND case. In the other cases, there's a query supposedly executing. But, will look for that possibility. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One Role, Two Passwords
Daniel Farina drfar...@acm.org writes: I wanted to test the waters on how receptive people might be to an extension that would allow Postgres to support two passwords for a given role. Not very. Why don't you just put two roles in the same group? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. I'm pretty concerned about the performance implications, too. In particular that sounds like you could get an unbounded amount of WAL emitted from a *purely read only* transaction flow. Which is not going to fly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers