[HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?
Hi hackers, The project I'm currently working with fsnapshot[1], is written in plain plpgsql and I need to sort all the oids in their creatable/droppable order. This has already been properly implemented in pg_dump_sort.c using Knuth's algorithm for topological sorting, with some special magic to find and break dependency loops. It's not possible to use a plain recursive query to do the trick (due to 'i' bidirectional dependencies and dependency loops). I need a general approach, only making use of pg_depend. The function should take no input arguments and the output argument should be oid[], containing a list of the oids in a creatable/droppable or order. It doesn't matter if it is left-to-right, least number of edges first. Any valid topological sort will do. I'm sure it's possible to implement it in plpgsql or plperl, but I wanted to check first if anyone has already made such a function to hopefully save some time? Thanks a lot! [1] https://github.com/gluefinance/fsnapshot -- Best regards, Joel Jacobson Glue Finance -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension upgrade, patch v0: debug help needed
Robert Haas robertmh...@gmail.com writes: Committed. Thanks! -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
David E. Wheeler da...@kineticode.com writes: Just so long as you're aware that you might get more challenges on this going forward. Sure, thanks for the reminder. That said I also remember the reaction when I used to scan the SHARE/contrib directory to find the extension control file having the right name property, and I don't see scanning the same directory in order to find out which upgrade file to consider depending on several parts of its name as so different. Current code allows you to use the same upgrade script for more than one source version, and does so in a way that it's easy to determine which upgrade file to seek for. I don't see the benefit of having the 'relocatable' property optional in the control file, but I see a huge drawback. Requiring it will force extension authors to at least have a glance at the docs to understand how to set it. It's important not to overlook it. I guess. I'll have to think about how to support it in PGXN, though. And the upgrade keys if they stay in. Disclaimer: the following is based on my understanding of how you want to bundle things, from several discussions we had together at pubs or on IRC, please don't read further if you're changed your mind about generating the control file from your PGXN YAML specification. Well, I think you're having a dependency inversion problem here. PGXN depends on extensions, not the other way round. Also, I really expect the extension facility to be mainly used for internal proprietary code, mainly procedure collections, and only occasionaly for publishing Open Source components. So you should be considering the control file as an input to your processes, a source file, not something that your service will hide for extension authors: there's no benefit that I can see in doing so. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Long time ago, I propose regclass like function which does not throw an error if the table is not found. Instead I want to let it return InvalidOid or NULL. Tatsuo Ishii is...@postgresql.org writes: Is there any way to use regclass without having ERROR? pgpool-II needs to find the oid from table name and for the purpose it issues something like SELECT 'table_name'::regproc::oid. Problem is, if the table does not exist, an error occured and the transaction aborts. Ideally if the table does not exist, the SELECT returns 0 (InvalidOid). I don't think the cast should act that way, but I could see providing a separate conversion function that returns 0 ... or perhaps better NULL ... if no match. Such a function should be very helpfull. Great! I made pretty simple function for this. Essential part is something like this: Datum pgpool_regclass(PG_FUNCTION_ARGS) { char*pro_name_or_oid = PG_GETARG_CSTRING(0); Oid result; PG_TRY(); { result = DirectFunctionCall1(regclassin, CStringGetDatum(pro_name_or_oid)); } PG_CATCH(); { result = InvalidOid; } PG_END_TRY(); PG_RETURN_OID(result); } IMO this implementation is the least invasive but not so elegant. Before proposing more complete patches, I would like to hear comments: which way I should go? The least invasive one like above? Or Should I refactor regclassin, for example implementing regclassin_gut which do the essential job, and making wrapper functions, one is active existing regclass, and the other act as new one? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] back branches vs. VS 2008
On Tue, Jan 4, 2011 at 04:49, Andrew Dunstan and...@dunslane.net wrote: On 01/03/2011 12:15 PM, I wrote: The following patch allows me to build the 8.3 and 8.4 branches using Visual Studio 2008, once the build system is patched. But I don't really know why. HEAD and 9.0 build fine without it. But those branches branches fail with a complaint about IPPROTO_IPV6 being undefined. The patch seems harmless enough. But I'd like to know why it's happening. Does anyone have a clue? -#ifdef IPV6_V6ONLY +#if defined(IPV6_V6ONLY) defined(IPPROTO_IPV6) if (addr-ai_family == AF_INET6) { if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, OK, what's going here is that, in the newer SDK, IPV6_V6ONLY is defined unconditionally, but IPPROTO_IPV6 is only defined if _WIN32_WINNT is set to 0x0501 or higher. We defined _WIN32_WINNT as 0x0500 until 9.0, when we changed it specifically to allow use of the right IPV6 settings. I wonder if anything else changed with that #define, though. This seems to me like a clear error in the MS headers. I don't think it makes any sense to define the settings constant but not the context constant. The fix I have suggested above doesn't seem unreasonable or terribly unsafe in these circumstances. The code clearly contemplates the setsockopt() call in question not having been run, as shown in this comment: Yeah, it seems reasonable - I assume you tested it and it doesn't fail in some *different* way than the one we expect in the code? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
Heikki Linnakangas wrote: You can of course LOCK TABLE as a work-around, if that's what you want. What I was trying to suggest upthread is that while there are other possible ways around this problem, the only one that has any hope of shipping with 9.1 is to do just that. So from my perspective, the rest of the discussion about the right way to proceed is moot for now. For some reason it didn't hit me until you said this that I could do the locking manually in my test case, without even touching the server-side code yet. Attached are a new pair of scripts where each pgbench UPDATE statement executes an explicit LOCK TABLE. Here's the result of a sample run here: $ pgbench -f update-merge.sql -T 60 -c 16 -j 4 -s 2 pgbench starting vacuum...end. transaction type: Custom query scaling factor: 2 query mode: simple number of clients: 16 number of threads: 4 duration: 60 s number of transactions actually processed: 84375 tps = 1405.953672 (including connections establishing) tps = 1406.137456 (excluding connections establishing) $ psql -c 'select count(*) as updated FROM pgbench_accounts WHERE NOT abalance=0' -d pgbench updated - 68897 (1 row) $ psql -c 'select count(*) as inserted FROM pgbench_accounts WHERE aid 10' -d pgbench inserted -- 34497 (1 row) No assertion crashes, no duplicate key failures. All the weird stuff I was running into is gone, so decent evidence the worst of the problems were all because the heavy lock I expecting just wasn't integrated into the patch. Congratulations to Boxuan: for the first time this is starting to act like a viable feature addition to me, just one with a moderately long list of limitations and performance issues. 1400 TPS worth of UPSERT on my modest 8-core desktop (single drive with cheating fsync) isn't uselessly slow. If I add SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; just after the BEGIN;, I don't see any serialization errors, and performance is exactly the same. Run a straight UPDATE over only the existing range of keys, and I get 7000 TPS instead. So the locking etc. is reducing performance to 20% of its normal rate, on this assertion+debug build. I can run this tomorrow (err, later today I guess looking at the time) on a proper system with BBWC and without asseritions to see if the magnitude of the difference changes, but I don't think that's the main issue here. Presuming the code quality issues and other little quirks I've documented (and new ones yet to be discovered) can get resolved here, and that's a sizeable open question, I could see shipping this with the automatic heavy LOCK TABLE in there. Then simple UPSERT could work out of the box via a straightforward MERGE. We'd need a big warning disclaiming that concurrent performance is very limited in this first release of the feature, but I don't know that this is at the unacceptable level of slow for smaller web apps and such. Until proper fine-grained concurrency is implemented, I think it would be PR suicide to release a version of this without a full table lock happening automatically though. The idea Robert advocated well, that it would be possible for advanced users to use even this rough feature in a smarter way to avoid conflicts and not suffer the full performance penalty, is true. But if you consider the main purpose here to be making it easier to get smaller MySQL apps and the like ported to PostgreSQL (which is what I see as goal #1), putting that burden on the user is just going to reinforce the old PostgreSQL is so much harder than MySQL stereotype. I'd much prefer to see everyone have a slow but simple to use UPSERT via MERGE available initially, rather than to worry about optimizing for the advanced user in a way that makes life harder for the newbies. The sort of people who must have optimal performance already have trigger functions available to them, that they can write and tweak for best performance. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books test-merge.sh Description: Bourne shell script \set nbranches :scale \set ntellers 10 * :scale \set naccounts 10 * :scale \setrandom aid 1 :naccounts \setrandom bid 1 :nbranches \setrandom tid 1 :ntellers \setrandom delta -5000 5000 BEGIN; -- Optional mode change -- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; LOCK TABLE pgbench_accounts; MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 100)::integer,:delta,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler); COMMIT; -- This syntax worked with MERGE v203 patch, but isn't compatible with v204 --MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 100)::integer,:delta,'')) AS
Re: [HACKERS] pg_dump --split patch
On 28.12.2010 22:44, Joel Jacobson wrote: Sent from my iPhone On 28 dec 2010, at 21:45, Gurjeet Singh singh.gurj...@gmail.com mailto:singh.gurj...@gmail.com wrote: The problem I see with suffixing a sequence id to the objects with name collision is that one day the dump may name myfunc(int) as myfunc.sql and after an overloaded version is created, say myfunc(char, int), then the same myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic. I agree, good point! Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc to reduce the need of truncating filenames. Also, it is a project policy that we do not introduce new features in back branches, so spending time on an 8.4.6 patch may not be the best use of your time. My company is using 8.4 and needs this feature, so I'll have to patch it anyway :) Start the easy way, by writing a (python|perl) filter for pg_dump -s output Once this is done, convert it into a patch for pg_dump Hannu Krosing http://www.2ndQuadrant.com/books/
Re: [HACKERS] pg_dump --split patch
On 28.12.2010 23:51, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 12/28/2010 04:44 PM, Joel Jacobson wrote: Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc to reduce the need of truncating filenames. I think that's just horrible. Does the i stand for integer or inet? And it will get *really* ugly for type names with spaces in them ... You think spaces are bad, try slashes ;-) Not to mention the need for including schemas in typenames sometimes. I think you're going to have a real problem trying to fully describe a function's signature in a file name of reasonable max length. something like funcname_number-of-arguments_hash_of_argument_type_list.sql seems like a reasonable compromise - you can find the function you are looking for without too much searching, even when overloaded and the uniqueness is still guaranteed. Hannu Krosing http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
On 28.12.2010 17:00, Joel Jacobson wrote: Dear fellow hackers, Problem: A normal diff of two slightly different schema dump files (pg_dump -s), will not produce a user-friendly diff, as you get all changes in the same file. Another Solution: I have used a python script for spliiting dump -s output into a directory structure for years A structure that seems to work well is 1. database_name.load.sql - file containing \i for all other files 2. tree of schema1\ +- objtype1\ |+ obj_of_type1_name1.sql |+ ... +-objtype2\ ... That is, first have one directory per schema on top level and then one for each type of objects inside have one file per object. Some things which may have non-unique names, like functions need extra uniquefication, like adding a underscore (or colon) separated list of argument types at the end. Most times you want to leave out the comment lines with OIDs so that you can diff the files against another version -- Hannu Krosing http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
Robert Haas wrote: And even if it isn't, the MERGE syntax is insane if what you really want to do is insert or update ONE record. If all we have is MERGE, people will keep doing it with a PL/pgsql stored procedure or some crummy application logic just so that they don't have to spend several days trying to understand the syntax. Heck, I understand the syntax (or I think I do) and I still think it's more trouble than its worth I hoped that the manual would have a clear example of this is how you do UPSERT with MERGE, preferrably cross-linked to the existing Example 39-2. Exceptions with UPDATE/INSERT trigger implementation that's been the reference implementation for this for a long time, so people can see both alternatives. New users will cut and paste that example into their code, and in the beginning neither know nor care how MERGE actually works, so long as the example does what it claims. I would wager the majority of PL/pgsql implementations of this requirement start the exact same way. I don't think the learning curve there is really smaller, it's just that you've just already been through it. I've been purposefully ignoring the larger applications of MERGE in the interest of keeping focus on a managable subset. But the more general feature set is in fact enormously useful for some types of data warehouse applications. Build REPLACE, and you built REPLACE. Build MERGE that is REPLACE now and eventually full high-performance MERGE, and you've done something with a much brighter future. I don't think the concurrency hurdles here are unique to this feature either, as shown by the regular overlap noted with the other serialization work. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSPI client authentication in non-Windows builds
* Robert Haas wrote: On Mon, Jan 3, 2011 at 8:11 AM, Christian Ullrichch...@chrullrich.net wrote: this patch adds support for connecting to servers running on Windows and requesting SSPI authentication. It does this by treating AUTH_REQ_SSPI the same as AUTH_REQ_GSS if no native SSPI support is available. I have to confess that I don't know whether this is a good idea or a bad idea. Both GSS and SSPI have advantages and disadvantages. To use SSPI, your backends must run as a dedicated domain account, so if you use the binary installer, you have to change permissions on the data directory and reconfigure the service. On the other hand, you do not need a keytab. To use GSS, you need a keytab, but not a domain user account, and thus no domain at all. Earlier, I had the concern that using client-side GSSAPI (or the Kerberos SSPI package) to connect to a server using SSPI Negotiate (as the backend currently does) was a violation of the published protocol, but that the Negotiate SSP handled this by falling back to Kerberos. I would have been reluctant to rely on this behavior, which I thought undocumented. However, I just found the documentation that says this is all right: http://msdn.microsoft.com/en-us/library/aa378748(v=VS.85).aspx A server that uses the Negotiate package is able to respond to client applications that specifically select either the Kerberos or NTLM security provider. This covers the case where the client is running Windows, because then libpq will actually use SSPI instead of GSSAPI, satisfying the letter of the documentation. By implication, because SSPI Kerberos is wire-level compatible with GSSAPI, it also covers the case where the client is running on another platform and uses native GSSAPI libraries to authenticate. If it was not for supporting NTLM through SSPI, it would even be possible to simply get rid of AUTH_REQ_SSPI entirely and let the server send AUTH_REQ_GSS for sspi lines in pg_hba.conf. By doing this, no patches to the client libraries are needed, because both libpq and the JDBC driver will automatically do the right thing on all platforms. -- Christian Ullrich -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER EXTENSION UPGRADE patch v1
Hi, Please find attached the patch to support ALTER EXTENSION UPGRADE, following exchanges made on this list — we can't really speak about decisions here, apparently, until commit is done :) The documentation is available online for easy browsing here: http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html The things we might want to discuss are, AFAIUI: - support for upgrading from pre-9.1 extension, or to get from a bunch of related pl code to a proper extension in your database I've done the code for that to work, the idea is pretty simple and relies on the following points. Please note that the attached patch only implements support for upgrading some contrib modules from pre-9.1, that's because I'm willing to get some approvement before continuing the effort — I foresee no road blocks here. - CREATE WRAPPER EXTENSION If you want to upgrade an extension, it first has to exists as an extension. When you're wanting to switch to extension, you need a way to bootstrap your extension object, because you want an entry in the catalogs and you don't want to run the script: it has already been run before you decided this would make for an extension. This command allows you to register an extension in the catalogs, bypassing any script and forcing the version to NULL. I'm not sold on the 'WRAPPER' keyword here, I just used an existing one in the grammar that looked like a good candidate. This was the best I could find, but IANANS (I am not a native speaker). - ALTER EXTENSION ... UPGRADE This allows you to upgrade an existing extension, wrapper included. The extension author needs to prepare the upgrade scripts, and to setup in the control file which to use. The new extension's version is static as the upgrade files you distribute depend on this (think packaging here) and so you pick the upgrade script file to execute depending only on the currently installed version. This picking is made by regular expression matching (using the code backing the ~(text, text) operator), and supports null as a special case. The first match wins, other candidates are bypassed. - CREATE EXTENSION ... WITH SYSID ... Thinking about upgrades, I added support for that too. This might be necessary for pg_upgrade, but I'm not so sure. It was an experiment but I though we might as well talk about it before I remove it from the patch :) - ALTER OBJECT ... SET EXTENSION ... This implements the magic behind upgrading from null, as it allows the users to add objects to an existing extension. Internally, this will only record a INTERNAL depedency between the object and the extension. This is mainly for use in upgrade scripts when you support pre-9.1 extension upgrade, or for internal PL code that grows into an extension after the fact. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support upgrade_extension.v1.patch.gz Description: ALTER EXTENSION UPGRADE, patch version 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On 04.01.2011 11:07, Tatsuo Ishii wrote: Datum pgpool_regclass(PG_FUNCTION_ARGS) { char*pro_name_or_oid = PG_GETARG_CSTRING(0); Oid result; PG_TRY(); { result = DirectFunctionCall1(regclassin, CStringGetDatum(pro_name_or_oid)); } PG_CATCH(); { result = InvalidOid; } PG_END_TRY(); PG_RETURN_OID(result); } IMO this implementation is the least invasive but not so elegant. It's not generally safe to suppress errors like that. You could leak locks or tuple descriptors etc. And if the error is not no scuh table, but e.g. out of memory, you don't want to suppress it anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)
t...@fuzzy.cz wrote: - I really am not sure about the changes made in pg_proc.h. I'm not sure how to assign OIDs to the new functions (I've simply chosen values that are were not used in this file), and I'm not sure about the other columns (I've copied and modified another function with the same parameter/return types) The description of the columns is at the beginning of pg_proc.h, the part that begins with CATALOG(pg_proc,1255)... The descriptions of some of the first 11 fields are mostly straighforward. The first fun part is that how may times the information expected in the second VARIABLE LENGTH FIELDS section repeats varies based on the parameters listed. The other thing that's usually confusing is that the types for the values are all expressed as type OID numbers. For example, if you see 25, that's the OID of the text type. You can see the whole list with: select oid,typname from pg_type; And if you go back to the file with that list in handle, a lot more of it should make sense. If you see a multiple parameter type list like 23 21, that's a function whose input values are of types (int4,int2), As for getting a new OID, if you go into src/include/catalog/ and run the unused_oids script, it will give you some help in figuring which have been used and which not. It's not worth getting too stressed about the number you choose in the patch submission, because commits between when you got your free number and when your patch is considered for commit can make your choice worthless anyway. There's a process referred to as catversion bump, where the database catalog version get updated to reflect things like new pg_proc information, that committers take care of as one of the last adjustments before final commit. Doing a final correction to the OID choice is a part every committer knows to look at. I wrote a talk that covered some additional trivia in this area, as well as other things people tend to get confused about in the source code, that you can find at http://www.pgcon.org/2010/schedule/attachments/142_HackingWithUDFs.pdf ; that might be helpful for some other things you might wonder about eventually. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
It's not generally safe to suppress errors like that. You could leak locks or tuple descriptors etc. And if the error is not no scuh table, but e.g. out of memory, you don't want to suppress it anyway. Thanks. I will create more invasive patch. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)
Tomas Vondra wrote: OK, so here goes the simplified patch - it tracks one reset timestamp for a background writer and for each database. Adding timestamps like this was something I wanted to do after adding pg_stat_reset_shared to 9.0, so since you've beaten me to it I'll review your patch and make sure it all works the way I was hoping instead. The whole per-table idea was never going to fly given how few people touch this area at all, but the way you're tracking things now seems reasonable. When you post an updated version of a patch that's already being tracked on the CommitFest app, please try to remember to add that update to the tracker there. I just did that for this 12/23 update so that's covered already. Next problem is that the preferred method for submitted patches uses context diffs. See http://wiki.postgresql.org/wiki/Working_with_Git for some information about the somewhat annoying way you have to setup git to generate those. Don't worry about that for this round though. I don't care about the diff formatting given the code involved, but it's something you should sort out if you do another update. PS: I've noticed Magnus posted a patch to track recovery conflicts, adding a new view pg_stat_database_conflicts. I have not checked it yet but it should not influence this patch. I need to do some testing of that anyway, so I'll take a look at any potential clash as part of my review. I want to check how this interacts with track_functions resets too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
Joel Jacobson wrote: To understand a change to my database functions, I would start by looking at the top-level, only focusing on the names of the functions modified/added/removed. At this stage, you want as little information as possible about each change, such as only the names of the functions. To do this, get a list of changes functions, you cannot compare two full schema plain text dumps using diff, as it would only reveal the lines changed, not the name of the functions, unless you are lucky to get the name of the function within the (by default) 3 lines of copied context. While you could increase the number of copied lines of context to a value which would ensure you would see the name of the function in the diff, that is not feasible if you want to quickly get a picture of the code areas modified, since you would then need to read through even more lines of diff output. I can agree on some use cases you've outlined, where there's merit to the general idea of your patch. But as an aside, you really should launch an investigation into some better diff tools if this is how you're doing this type of work. Last week I reviewed 3K lines worth of changes from two versions of a 12K line schema dump I'd never seen before in a couple of hours using kdiff3. I'd have killed myself before finishing if I had to do the same job with traditional diff as you're describing it here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast
On Mon, Jan 03, 2011 at 10:23:03PM -0500, Robert Haas wrote: Can you add this to the currently-open CommitFest, so we don't lose track of it? https://commitfest.postgresql.org/action/commitfest_view/open Done. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
2011/1/4 Jeff Davis pg...@j-davis.com: I have been updating my work in progress here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Right now, it's not in a reviewable state, but those interested can glance through the code. Quick synopsis (for illustration purposes only; don't expect much from the current code): CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp); I am interested in how you define increment/decrement operation of range value in discrete types. The window functions and PARTITION also want to represent RANGE but there's no clear solution. Sorry if it's already been discussed since I didn't track the threads. Regards -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast
Hello I looked on patch does work toast_raw_datum_size on packed varlena corectly? regards Pavel Stehule 2011/1/4 Noah Misch n...@leadboat.com: On Mon, Jan 03, 2011 at 10:23:03PM -0500, Robert Haas wrote: Can you add this to the currently-open CommitFest, so we don't lose track of it? https://commitfest.postgresql.org/action/commitfest_view/open Done. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
* Jeff Davis: 4. For the GiST penalty function, and perhaps some picksplit algorithms, it might be nice to know the length of a range, or do some other kinds of math. It introduces a lot of complexity to try to define math functions for each subtype, and try to make sure they behave sanely. So I was thinking that the user might need to specify a function that converts the subtype into a float that approximates a value's position in the total order. Doesn't the eqsel hint already provide this information? -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast
Hi Pavel, On Tue, Jan 04, 2011 at 03:13:11PM +0100, Pavel Stehule wrote: I looked on patch Thanks. does work toast_raw_datum_size on packed varlena corectly? Yes, as best I can tell. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] back branches vs. VS 2008
On 01/04/2011 04:43 AM, Magnus Hagander wrote: OK, what's going here is that, in the newer SDK, IPV6_V6ONLY is defined unconditionally, but IPPROTO_IPV6 is only defined if _WIN32_WINNT is set to 0x0501 or higher. We defined _WIN32_WINNT as 0x0500 until 9.0, when we changed it specifically to allow use of the right IPV6 settings. I wonder if anything else changed with that #define, though. Probably. I'm not going to suggest turning it on at this stage. There are lots of references to this specific OS level in the headers. This seems to me like a clear error in the MS headers. I don't think it makes any sense to define the settings constant but not the context constant. The fix I have suggested above doesn't seem unreasonable or terribly unsafe in these circumstances. The code clearly contemplates the setsockopt() call in question not having been run, as shown in this comment: Yeah, it seems reasonable - I assume you tested it and it doesn't fail in some *different* way than the one we expect in the code? Yes, I enabled IPV6 and set listen_addresses to * an no untoward events appeared. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Tatsuo Ishii is...@postgresql.org writes: It's not generally safe to suppress errors like that. You could leak locks or tuple descriptors etc. And if the error is not no scuh table, but e.g. out of memory, you don't want to suppress it anyway. Thanks. I will create more invasive patch. Why is any of this necessary? It sure looks like you are solving a problem at the wrong level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote: I note that this uninitialized pages with standbys has cropped up from time to time - I wonder if in most/all the cases folk were using Pitrtools? I deployed Pitrtools a lot when I was working for CMD, and I haven't seen any issues with that. It is just a wrapper, nothing else... Regards, -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [HACKERS] regclass without error?
Why is any of this necessary? It sure looks like you are solving a problem at the wrong level. Please read upthread. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Tatsuo Ishii is...@postgresql.org writes: Why is any of this necessary? It sure looks like you are solving a problem at the wrong level. Please read upthread. You haven't made any argument why this shouldn't be solvable at the client side, or at worst with a plpgsql DO block; either of which answer would have the considerable merit of working against existing server releases. But in any case I see no reason to mess with the regclass code. The C code you want is just names = stringToQualifiedNameList(class_name); result = RangeVarGetRelid(makeRangeVarFromNameList(names), true); and there is no way that refactoring is going to yield a solution more elegant than just duplicating those two lines --- especially since regclassin has other cases for which zero is a non-error result. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote: Heikki Linnakangas wrote: You can of course LOCK TABLE as a work-around, if that's what you want. Presuming the code quality issues and other little quirks I've documented (and new ones yet to be discovered) can get resolved here, and that's a sizeable open question, I could see shipping this with the automatic heavy LOCK TABLE in there. Then simple UPSERT could work out of the box via a straightforward MERGE. How about implementing an UPSERT command as take the lock, do the merge? That way, we'd have both the simplicity for the simpler cases and a way to relax consistency guarantees for those who would like to do so. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
Jeff Davis pg...@j-davis.com writes: 2. We need to use the subtype's IO functions, but those may not be immutable. So, rather than create new IO functions for each range type, I was thinking that I'd use just three (anyrange_i_in, anyrange_s_in, and anyrange_v_in), and select the right one at definition time, based on the subtype's IO functions' volatility. That seems like a bit of a hack -- any better ideas? You should just do what we do for arrays and records, ie, mark the I/O functions stable. There is no reason for anyrange to have a more complicated approach to this than the existing composite-type structures do. See discussion thread here http://archives.postgresql.org/pgsql-hackers/2010-07/msg00932.php and commit here http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php 3. Right now I allow user-defined parse/deparse functions to be specified. In almost all cases, I would think that we want the text format to be something like: [ 2010-01-01, 2011-01-01 ) where the brackets denote inclusivity, and the left and right sides can be optionally double-quoted. Is it even worth having these parse/deparse functions, or should we just force the obvious format? +1 for forcing a single consistent format. I compare this to the Berkeley-era decision to let types specify nondefault array delimiters --- that was flexibility that didn't help anybody, just resulted in over-complicated code (or code that would fall over if someone tried to actually use a delimiter other than comma...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?
Joel Jacobson j...@gluefinance.com writes: It's not possible to use a plain recursive query to do the trick (due to 'i' bidirectional dependencies and dependency loops). Well I came up with that while working on some extension related fun dependency problems, I guess it could help you: ~:5490=# WITH RECURSIVE depends AS ( select 16385 as nsp, objid, refobjid, array[refobjid] as deps from pg_depend where refobjid = 16854 and deptype != 'p' UNION ALL select p.nsp, p.objid, d.refobjid, deps || d.refobjid from pg_depend d JOIN depends p ON d.objid = p.objid where d.deptype != 'p' and not d.refobjid = any(deps) ) select * from depends; nsp | objid | refobjid |deps ---+---+--+ 16385 | 16851 |16854 | {16854} 16385 | 16852 |16854 | {16854} 16385 | 16853 |16854 | {16854} 16385 | 16851 | 2200 | {16854,2200} 16385 | 16852 | 2200 | {16854,2200} 16385 | 16852 |16851 | {16854,16851} 16385 | 16853 | 2200 | {16854,2200} 16385 | 16852 | 2200 | {16854,16851,2200} 16385 | 16852 |16851 | {16854,2200,16851} (9 rows) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?
On Tue, Jan 04, 2011 at 09:29:55AM +0100, Joel Jacobson wrote: Hi hackers, The project I'm currently working with fsnapshot[1], is written in plain plpgsql and I need to sort all the oids in their creatable/droppable order. This has already been properly implemented in pg_dump_sort.c using Knuth's algorithm for topological sorting, with some special magic to find and break dependency loops. It's not possible to use a plain recursive query to do the trick (due to 'i' bidirectional dependencies and dependency loops). I believe it is possible. I'll try to do it this evening. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1
On Tue, Jan 04, 2011 at 12:31:55PM +0100, Dimitri Fontaine wrote: Hi, Please find attached the patch to support ALTER EXTENSION UPGRADE, Do you plan to have ALTER EXTENSION ... UPGRADE TO VERSION ... , or the more general, ALTER EXTENSION ... ALTER VERSION TO ... ? I get that this might not be a 9.1 feature, but it's sure to be one people who need to deploy software want. A lot. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On 2011-01-04 6:27 PM, David Fetter wrote: On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote: Heikki Linnakangas wrote: You can of course LOCK TABLE as a work-around, if that's what you want. Presuming the code quality issues and other little quirks I've documented (and new ones yet to be discovered) can get resolved here, and that's a sizeable open question, I could see shipping this with the automatic heavy LOCK TABLE in there. Then simple UPSERT could work out of the box via a straightforward MERGE. How about implementing an UPSERT command as take the lock, do the merge? That way, we'd have both the simplicity for the simpler cases and a way to relax consistency guarantees for those who would like to do so. That, unfortunately, won't work so well in REPEATABLE READ :-( But I, too, am starting to think that we should have a separate, optimized command to do UPSERT/INSERT .. IGNORE efficiently and correctly while making MERGE's correctness the user's responsibility. Preferably with huge warning signs on the documentation page. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On Tue, Jan 04, 2011 at 07:02:54PM +0200, Marko Tiikkaja wrote: On 2011-01-04 6:27 PM, David Fetter wrote: On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote: Heikki Linnakangas wrote: You can of course LOCK TABLE as a work-around, if that's what you want. Presuming the code quality issues and other little quirks I've documented (and new ones yet to be discovered) can get resolved here, and that's a sizeable open question, I could see shipping this with the automatic heavy LOCK TABLE in there. Then simple UPSERT could work out of the box via a straightforward MERGE. How about implementing an UPSERT command as take the lock, do the merge? That way, we'd have both the simplicity for the simpler cases and a way to relax consistency guarantees for those who would like to do so. That, unfortunately, won't work so well in REPEATABLE READ :-( There are caveats all over READ COMMITTED/REPEATABLE READ/SNAPSHOT. The only really intuitively obvious behavior is SERIALIZABLE, which we'll have available in 9.1. :) But I, too, am starting to think that we should have a separate, optimized command to do UPSERT/INSERT .. IGNORE efficiently and correctly while making MERGE's correctness the user's responsibility. Preferably with huge warning signs on the documentation page. +1 for the HUGE WARNING SIGNS :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, Jan 4, 2011 at 2:29 AM, Jeff Davis pg...@j-davis.com wrote: I liked Robert's suggestion here: http://archives.postgresql.org/message-id/aanlktiks_x93_k82b4f_ga634wci0oeb9ftrurf28...@mail.gmail.com which says that the user can just define a canonicalize function that will take a range as input (or perhaps the logical pieces of a range) and put it into an appropriate canonical representation. For instance, int4range_canonical might take (1,4] and turn it into [2,4]. This is similar to a few other ideas, but Robert's idea seems to require the least effort by the person defining the range type, because postgresql can still handle representation. It doesn't allow for all of the suggested features. In particular, it would not allow granules to be specified for discrete ranges. But on balance, it seems like this is the most conceptually simple and I think it satisfies the primary use cases. Maybe I'm missing something, but it seems like this approach could support granules. You just have to define the canonicalize function in terms of the granule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
On Jan 4, 2011, at 12:46 AM, Dimitri Fontaine wrote: David E. Wheeler da...@kineticode.com writes: Just so long as you're aware that you might get more challenges on this going forward. Sure, thanks for the reminder. That said I also remember the reaction when I used to scan the SHARE/contrib directory to find the extension control file having the right name property, and I don't see scanning the same directory in order to find out which upgrade file to consider depending on several parts of its name as so different. Silly programmer! You don't have to do that yourself! You can teach the computer to do it for you. It's very good at that sort of thing! Current code allows you to use the same upgrade script for more than one source version, and does so in a way that it's easy to determine which upgrade file to seek for. As Tom pointed out, you can do the same with naming conventions by having scripts \i each other as appropriate. I guess. I'll have to think about how to support it in PGXN, though. And the upgrade keys if they stay in. Disclaimer: the following is based on my understanding of how you want to bundle things, from several discussions we had together at pubs or on IRC, please don't read further if you're changed your mind about generating the control file from your PGXN YAML specification. s/YAML/JSON/, and okay. :-) Well, I think you're having a dependency inversion problem here. PGXN depends on extensions, not the other way round. What? That makes no sense, so I must be misunderstanding what you're trying to say. Also, I really expect the extension facility to be mainly used for internal proprietary code, mainly procedure collections, and only occasionaly for publishing Open Source components. This is because you're not a Perl programmer. See CPAN. So you should be considering the control file as an input to your processes, a source file, not something that your service will hide for extension authors: there's no benefit that I can see in doing so. I know, but then you're not a CPAN guy. You're a Debian package guy. It's hardly surprising that we'll have inverted views of this sort of thing. Frankly, I think that you might find StackBuilder a better fit with your world view. http://pgfoundry.org/projects/stackbuilder/ Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, 2011-01-04 at 15:51 +0900, Itagaki Takahiro wrote: On Tue, Dec 28, 2010 at 22:17, Magnus Hagander mag...@hagander.net wrote: We definitely need the very basic level for 9.1, and we can always improve on it later :-) pg_stat_walsender. It would then only need the columns for procpid, usesysid, usename, client_addr, client_port, and the WALsender specific fields. Yeah, agreed. backend_start is probably the best one Here are patches for pg_stat_walsender. I split the feature into two pieces: * get_host_and_port.patch It separates host and port formatter as a subroutine from pg_stat_activity. In addition, make pg_stat_get_backend_client_addr/port() functions to use the subroutine to reduce duplicated codes. * pg_stat_walsender.patch It adds pg_stat_walsender system view. It has subset columns of pg_stat_activity and only one additional WAL sender specific information via WALSndStatus(). I named the column sending location because standby servers might not have received the WAL record; if we had synchronous replication, a new sent location wold be added. But the naming is still an open question. Comments welcome. There is O(max_wal_senders^2) complexity in the view, But I think it is not so serious problem because we can expect max_wal_senders is 10 or so at most. CREATE VIEW pg_stat_walsender AS SELECT S.procpid, S.usesysid, U.rolname AS usename, S.client_addr, S.client_port, S.backend_start, S.xlog_sending FROM pg_stat_get_walsender(NULL) AS S, pg_authid U WHERE S.usesysid = U.oid; Just seen you started working on this again. Very good. I enclose some snippets of code I was working on, which I am removing from my patch in favour of your work as a separate commit. The way I coded it was a new SRF that joins to the existing pg_stat_activity. So no initdb required, and this can also easily be included as an external module for 9.0. Please notice also that my coding of the new SRF does not have the O^2 issue you mention, which I was keen to avoid. The sent pointer is needed whether or not we have sync rep. We should also include application name, since the user may set that in the standby for all the same reasons it is set elsewhere. Small point: please lets not call this pg_stat_walsender? pg_stat_replication_sent and pg_stat_replication_received would be easier for normal humans to understand. I would very much appreciate it if one of you could complete something here and commit in the next few days. That would then allow me to extend the view with sync rep specific info for monitoring and patch testing. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 346eaaf..75419b7 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -488,6 +488,21 @@ CREATE VIEW pg_stat_activity AS WHERE S.datid = D.oid AND S.usesysid = U.oid; +CREATE VIEW pg_stat_replication_activity AS +SELECT +S.procpid, +S.usesysid, +U.rolname AS usename, +S.application_name, +S.client_addr, +S.client_port, +S.backend_start, +R.sent_location +FROM pg_stat_get_activity(NULL) AS S, pg_authid U, +pg_stat_get_replication_activity() AS R +WHERE S.usesysid = U.oid AND +S.procpid = R.procpid; + CREATE VIEW pg_stat_database AS SELECT D.oid AS datid, diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e9d8847..4422f5a 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -37,6 +37,7 @@ #include signal.h #include unistd.h +#include funcapi.h #include access/xlog_internal.h #include catalog/pg_type.h #include libpq/libpq.h @@ -49,6 +50,7 @@ #include storage/ipc.h #include storage/pmsignal.h #include tcop/tcopprot.h +#include utils/builtins.h #include utils/guc.h #include utils/memutils.h #include utils/ps_status.h @@ -1122,6 +1124,91 @@ WalSndWakeup(void) } /* + * Returns the Send position of walsenders with given pid. + * Or InvalidXLogRecPtr if none. + */ +Datum +pg_stat_get_replication_activity(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + + if (SRF_IS_FIRSTCALL()) + { + MemoryContext oldcontext; + TupleDesc tupdesc; + + funcctx = SRF_FIRSTCALL_INIT(); + + oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx); + +#define PG_STAT_GET_REP_ACTIVITY_COLS 2 + tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REP_ACTIVITY_COLS, false); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, procpid, INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, sent_location, TEXTOID, -1, 0); + + funcctx-tuple_desc =
Re: [HACKERS] can shared cache be swapped to disk?
On Sun, Dec 19, 2010 at 6:14 AM, Martijn van Oosterhout klep...@svana.org wrote: On Sat, Dec 18, 2010 at 11:59:33PM -0800, Jeff Janes wrote: On Sat, Dec 18, 2010 at 10:11 PM, flyusa2010 fly flyusa2...@gmail.com wrote: hi, folks! I see that shared cache is implemented by system v shared memory. I wonder whether data in this area can be swapped out to disk. Isn't it bad that we read data from disk, put data in shared cache, and finally data in shared cache is swapped to disk again! Why not use shmctl(..SHM_LOCK..) to pin data in main memory? Thanks! I've tried that on a recent linux kernel, to see if it would allow shared_buffers to usefully be a large fraction of total memory. It didn't help. So either swapping wasn't the problem in the first place, or the kernel ignores the order. Correct. The kernel ignores locking requests because it's a great way to DOS a machine. For example, mlock() of large blocks of memory is also not permitted for similar reasons. Does it ignore such requests in general, or only under certain situations? If the latter, do you know what those situations are? If the former, that seems incredibly bogus. There are plenty of ways to DOS a machine. The main way you prevent DOS by your own authorized users (other than firing them) on linux is by setrlimit, not by claiming to implement a feature you haven't actually implemented, or by implementing a feature but rendering it completely useless for the purpose it was intended for. RLIMIT_MEMLOCK exists, it has a small default hard limit, and only root can increase that. If root has gone out of its way to grant the postgres user a higher limit, the kernel should respect that, at least up until the situation become truly desperate. However, I don't have any evidence it is being ignored. I just know that locking the shared memory did not improve things, but I didn't verify that shared memory getting swapped out was the problem in the first place. The way you make sure shared memory doesn't get swapped out is to make sure it gets used. (i.e. don't give 2GB shared memory when your database is 100MB). And don't make your shared memory so large that you're creating significant memory pressure, otherwise the kernel might choose to swap our your shared memory rather than say the webserver. Your shared memory should be reasonably sized, but you should make sure the kernel has enough cache memory it can throw away first. Unfortunately it is hard to know what the kernel considers to be significant memory pressure. My experience (from mostly non-pgsql work) is that kernel has what I would consider enough cache memory to throw away, but for some reason doesn't throw it away but does more counter productive things instead. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote: It doesn't allow for all of the suggested features. In particular, it would not allow granules to be specified for discrete ranges. But on balance, it seems like this is the most conceptually simple and I think it satisfies the primary use cases. Maybe I'm missing something, but it seems like this approach could support granules. You just have to define the canonicalize function in terms of the granule. I meant that it doesn't support them as an explicit, user-visible concept. The main drawback here is that only a select group of people will be defining discrete range types at all, because it would require them to define a function first. Perhaps that's for the best, because, (as Tom pointed out) we don't want someone using floats and then specifying a granule of '0.01'. While we're talking about it, one question I had is: should the canonicalize function be: /* works on the deserialized information right before serialization */ canonical(flags, lower_bound, upper_bound) or /* works on the serialized form right after serialization */ range = canonical(range) I would lean toward the latter because it's simpler on the user (and allows non-C functions). But perhaps an efficiency argument could be made for the former because it could avoid one round of deserialize/reserialize when the representation is not already in canonical form. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, 2011-01-04 at 14:18 +, Florian Weimer wrote: * Jeff Davis: 4. For the GiST penalty function, and perhaps some picksplit algorithms, it might be nice to know the length of a range, or do some other kinds of math. It introduces a lot of complexity to try to define math functions for each subtype, and try to make sure they behave sanely. So I was thinking that the user might need to specify a function that converts the subtype into a float that approximates a value's position in the total order. Doesn't the eqsel hint already provide this information? Can you clarify what you mean? I don't know what the eqsel hint is. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
All, This is a pointless argument. Eventually, we will be implementing all possible sync rep configurations, because different users *need* different configurations. Some users care more about durability, some more about availability, and some more about response time. And you can't have all three, which was my point about A,D,R (also the point of CAP). For that matter, any single configuration will be useful to a large number of users, and an even larger number will be able to work around while they wait for 9.2. Further, the knowledge we gain by having some kind of synch rep in the field will allow us to implement the different configurations correctly, which *no* amount of arguing on e-mail will. The perfect is the enemy of the good. The relevant question is: which configuration(s) can we have ready for the next CommitFest and alpha release? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 1/2/11 12:35 AM, Heikki Linnakangas wrote: Very likely. A synchronous standby can bring the master to a halt, while an asynchronous one is rather harmless. If I were a DBA, and the data wasn't very sensitive, I would liberally hand out async privileges to my colleagues to set up reporting standbys, test servers etc. But I would *not* give them synchronous privileges, because sooner or later one would go hmm, I wonder what happens if I make this synchronous, or haphazardly copy the config file from a synchronous standby. That would either bring down the master, or act as a fake standby, acknowledging commits before they're flushed to the real synchronous standby. Either one would be bad. That's not very likely. Shops with enough DBAs to actually have a heirarchy of database access are rare indeed; the average business has trouble employing *one* fulltime DBA. Also, you're mistaken if you think that DBA's wouldn't restrict the ability of development users to set up asynch connections; those have a significant impact on the production server as well. However, the possibility of *accidental* DOS is a bit more likely, per accidentally using the wrong config file. I don't think you can guard against that with permissions, though; if a DBA is copying the config file from a synchronous server accidentally, presumably he's copying the .pgpass file as well, and then the new server is identical to the legit synch rep servers as far as the master is concerned. So, again, I don't agree that a separate synchrep permission is useful, or warranted. However, your arguments *do* make me backpedal on the issue of having a list of synch rep roles on the master. I can easily imagine a DBA needing to rapidly disable synch rep if replication is failing, without taking the time to log in to several separate standbys, one or more of which might be improperly configured and inaccessible. I can't think of a simpler way to do that than having a synch_rep_roles configuration on the master. That would also handle control issues for the senior DBA, since you'd need superuser access to the master to modify it. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, Jan 4, 2011 at 12:48 PM, Simon Riggs si...@2ndquadrant.com wrote: The sent pointer is needed whether or not we have sync rep. We should also include application name, since the user may set that in the standby for all the same reasons it is set elsewhere. Small point: please lets not call this pg_stat_walsender? pg_stat_replication_sent and pg_stat_replication_received would be easier for normal humans to understand. Eh... I may be showing my status as a non-normal human, but I know exactly what pg_stat_walsender is (it's the view that shows you the status of the WAL senders you've allowed by configuring max_wal_senders0) but I have no idea what pg_stat_replication_sent and pg_stat_replication_received are supposed to be. Why two views? *scratches head in confusion* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
Eh... I may be showing my status as a non-normal human, but I know exactly what pg_stat_walsender is (it's the view that shows you the status of the WAL senders you've allowed by configuring max_wal_senders0) but I have no idea what pg_stat_replication_sent and pg_stat_replication_received are supposed to be. Why two views? *scratches head in confusion* How about one view, called pg_stat_replication? This would be clear even to newbies, which none of the other view names would ... -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote: 2011/1/4 Jeff Davis pg...@j-davis.com: I have been updating my work in progress here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Right now, it's not in a reviewable state, but those interested can glance through the code. Quick synopsis (for illustration purposes only; don't expect much from the current code): CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp); I am interested in how you define increment/decrement operation of range value in discrete types. The window functions and PARTITION also want to represent RANGE but there's no clear solution. Sorry if it's already been discussed since I didn't track the threads. The user would specify a canonical function like: CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp, CANONICAL=my_int4range_canonical); That function would be called when constructing ranges on input or after a computation, and could change something like (1,4] into [2,4] if you prefer the latter form. So the range types would not have increments, decrements, granules, or knowledge about the difference type (e.g. interval is the difference type for timestamp). What support do you need/want from range types to help with new window function features? Also, partitioning might have some use for range types to represent range partitions. Comments are welcome. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote: The relevant question is: which configuration(s) can we have ready for the next CommitFest and alpha release? Based upon that series of conversations, I've reworked the design so that there is (currently) only a single standby offering sync rep at any one time. Other standbys can request to be sync standbys but they only become the sync standby if the first one fails. Which was simple to do and bridges the challenges of an exactly identified sync standby and the fragility of too closely specifying the config. I think you're right that trying to please everyone is not going to be possible in this release, because of the explosion of parameter combinations that require testing and because of the explosion in my head that causes. I'm not feeling well now, so I'm going to go to bed, not just to avoid snapping at people. Even given that short interlude, I see no problem about delivery. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, Jan 2, 2011 at 4:19 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sun, 2011-01-02 at 18:54 +0200, Heikki Linnakangas wrote: I believe we all agree that there's different use cases that require different setups. Both first-past-the-post and wait-for-all-to-ack have their uses. Robert's analysis is that first-past-the-post doesn't actually improve the durability guarantee (according to his calcs). Which means that 1 primary, 2 sync standbys with first-past-the-post is actually worse than 1 primary, 1 sync and 1 async standby in terms of its durability guarantees. So ISTM that Robert does not agree that both have their uses. I think it depends on what failure modes you want to protect against. If you have a primary in New York, a secondary in Los Angeles, and another secondary in London, you might decide that the chances of two standbys being taken out by the same event are negligible, or alternatively that if one event does take out both of them, it'll be something like a meteor where you'll have bigger things to worry about than lost transactions. In that case, requiring one ACK but not two is pretty sensible. If the primary goes down, you'll look at the two remaining machines (which, by presumption, will still be up) and promote whichever one is ahead. In this setup, you get a performance benefit from waiting for either ACK rather than both ACKs, and you haven't compromised any of the cases you care about. However, if you have the traditional close/far setup, things are different. Suppose you have a primary and a secondary in New York and another secondary in Los Angeles. Now it has to be viewed as a reasonable possibility that you could lose the New York site. If that happens, you need to be able to promote the LA standby *without reference to the NY standby*. So you really can't afford to do the 1-of-2 thing, because then when NY goes away you're not sure whether the LA standby is safe to promote. So, IMHO, it just depends on what you want to do. I'm not sure what the point of such a timeout in general is, but people have requested that. Again, this sounds like you think a timeout has no measurable benefit, other than to please some people's perceived needs. The wait-for-all-to-ack looks a lot less ridiculous if you also configure a timeout and don't wait for disconnected standbys Does it? Do Robert, Stefan and Aidan agree? What are the availability and durability percentages if we do that? Based on those, we may decide to do that instead. But I'd like to see some analysis of your ideas, not just a we could. Since nobody has commented on my analysis, lets see someone else's. Here's my take on this point. I think there is a use case for waiting for a disconnected standby and a use case for not waiting for a disconnected standby. The danger of NOT waiting for a disconnected standby is that if you then go on to irretrievably lose the primary, you lose transactions. But on the other hand, if you do wait, you've made the primary unavailable. I don't know that there's one right answer here. For some people, if they can't be certain of recording the transaction in two places, then it may be better to not process any transactions at all. For other people, it may be better to process transactions unprotected for a while while you get a new standby up. It's not for us to make that judgment; we're here to provide options. Having said that, I am OK with whichever one we want to implement first so long as we keep the door open to doing the other one later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
I'm not feeling well now, so I'm going to go to bed, not just to avoid snapping at people. Even given that short interlude, I see no problem about delivery. Cool! Thanks Simon. Feel better. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, 2011-01-04 at 10:50 -0800, Josh Berkus wrote: Eh... I may be showing my status as a non-normal human, but I know exactly what pg_stat_walsender is (it's the view that shows you the status of the WAL senders you've allowed by configuring max_wal_senders0) but I have no idea what pg_stat_replication_sent and pg_stat_replication_received are supposed to be. Why two views? *scratches head in confusion* How about one view, called pg_stat_replication? This would be clear even to newbies, which none of the other view names would ... hmmm I think pg_stat_standby might be more relevant but I definitely agree something more newbie appropriate is in order. Joshua D. Drake -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
hmmm I think pg_stat_standby might be more relevant but I definitely agree something more newbie appropriate is in order. I'd be fine with that name, too. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, Jan 4, 2011 at 20:28, Josh Berkus j...@agliodbs.com wrote: hmmm I think pg_stat_standby might be more relevant but I definitely agree something more newbie appropriate is in order. I'd be fine with that name, too. That seems kind of backwards though - given that the view only contains data on the master... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1
David Fetter da...@fetter.org writes: Do you plan to have ALTER EXTENSION ... UPGRADE TO VERSION ... , or the more general, ALTER EXTENSION ... ALTER VERSION TO ... ? Well why not, but I'm not sure I understand what you have in mind here. I don't exactly see how to install more than one version on the server, at the filesystem level, with the current infrastructure. The way debian solves it for multi-versions packages, such as PostgreSQL for example, is to add the (major) version number in the package name, so that you have independent packages such as: postgresql-8.3 postgresql-8.4 postgresql-9.0 Are you thinking about doing something similar with extensions? The only other way I'm able to think about your request would be for the command to simply ERROR out when the premises given by the users are not matching the reality of what's installed both in the catalogs and on the file system. If that's what you're talking about, we still can make it material for 9.1, I believe. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, Jan 4, 2011 at 2:31 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jan 4, 2011 at 20:28, Josh Berkus j...@agliodbs.com wrote: hmmm I think pg_stat_standby might be more relevant but I definitely agree something more newbie appropriate is in order. I'd be fine with that name, too. That seems kind of backwards though - given that the view only contains data on the master... I think pg_stat_replication is better than pg_stat_standby, but I'm still not convinced we shouldn't go with the obvious pg_stat_walsenders. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
David E. Wheeler da...@kineticode.com writes: As Tom pointed out, you can do the same with naming conventions by having scripts \i each other as appropriate. This is a deprecated idea, though. We're talking about the pg_execute_from_file() patch that has been applied, but without the pg_execute_sql_file() function. So that part is internal to the backend extension code and not available from SQL anymore. There's no consensus to publish a bakend \i like function. So there's no support for this upgrade script organizing you're promoting. Unless the consensus changes again (but a commit has been done). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 01/04/2011 07:51 PM, Simon Riggs wrote: On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote: The relevant question is: which configuration(s) can we have ready for the next CommitFest and alpha release? Based upon that series of conversations, I've reworked the design so that there is (currently) only a single standby offering sync rep at any one time. Other standbys can request to be sync standbys but they only become the sync standby if the first one fails. Which was simple to do and bridges the challenges of an exactly identified sync standby and the fragility of too closely specifying the config. ah cool - like that approach for 9.1! Thanks for working on that! Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, Jan 4, 2011 at 1:18 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote: It doesn't allow for all of the suggested features. In particular, it would not allow granules to be specified for discrete ranges. But on balance, it seems like this is the most conceptually simple and I think it satisfies the primary use cases. Maybe I'm missing something, but it seems like this approach could support granules. You just have to define the canonicalize function in terms of the granule. I meant that it doesn't support them as an explicit, user-visible concept. The main drawback here is that only a select group of people will be defining discrete range types at all, because it would require them to define a function first. Perhaps that's for the best, because, (as Tom pointed out) we don't want someone using floats and then specifying a granule of '0.01'. While we're talking about it, one question I had is: should the canonicalize function be: /* works on the deserialized information right before serialization */ canonical(flags, lower_bound, upper_bound) or /* works on the serialized form right after serialization */ range = canonical(range) I would lean toward the latter because it's simpler on the user (and allows non-C functions). Yeah, me too. But perhaps an efficiency argument could be made for the former because it could avoid one round of deserialize/reserialize when the representation is not already in canonical form. I believe this might be an appropriate time to apply Knuth's Law. I'm not thrilled with the amount of palloc overhead we have in the backend, but absent some evidence that this case is going to be particularly significant, I'd be disinclined to contort the interface. I suspect that if you run oprofile this won't be the bottleneck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
On Jan 4, 2011, at 11:48 AM, Dimitri Fontaine wrote: As Tom pointed out, you can do the same with naming conventions by having scripts \i each other as appropriate. This is a deprecated idea, though. We're talking about the pg_execute_from_file() patch that has been applied, but without the pg_execute_sql_file() function. So that part is internal to the backend extension code and not available from SQL anymore. There's no consensus to publish a bakend \i like function. So there's no support for this upgrade script organizing you're promoting. Unless the consensus changes again (but a commit has been done). To be clear, consensus was not reached, by my reading. It may be that it makes sense to restore pg_execute_sql_file(), perhaps to run only in the context of ALTER EXTENSION. Just to be clear where I'm coming from, as an extension developer, I would like PostgreSQL extensions to: * Prefer convention over configuration * Not make me do more work that the computer can do Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Tue, Jan 4, 2011 at 2:50 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 01/04/2011 07:51 PM, Simon Riggs wrote: On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote: The relevant question is: which configuration(s) can we have ready for the next CommitFest and alpha release? Based upon that series of conversations, I've reworked the design so that there is (currently) only a single standby offering sync rep at any one time. Other standbys can request to be sync standbys but they only become the sync standby if the first one fails. Which was simple to do and bridges the challenges of an exactly identified sync standby and the fragility of too closely specifying the config. ah cool - like that approach for 9.1! Yeah, I like that idea too, on first blush. I think we should think it over and see whether we're committing ourselves to any design decisions we may later regret - what parameters will we need to add from that point to get all the configs we ultimately want to support? But it seems a reasonable starting point, and we can argue about the rest once we have working code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On 04.01.2011 21:43, Robert Haas wrote: On Tue, Jan 4, 2011 at 2:31 PM, Magnus Hagandermag...@hagander.net wrote: On Tue, Jan 4, 2011 at 20:28, Josh Berkusj...@agliodbs.com wrote: hmmm I think pg_stat_standby might be more relevant but I definitely agree something more newbie appropriate is in order. I'd be fine with that name, too. That seems kind of backwards though - given that the view only contains data on the master... I think pg_stat_replication is better than pg_stat_standby, but I'm still not convinced we shouldn't go with the obvious pg_stat_walsenders. How about pg_stat_replication_activity? If I understood correctly, the view is similar to pg_stat_activity, but displays information about connected standbys rather than regular backends. It's a bit long name, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Tue, Jan 04, 2011 at 10:50:12AM -0800, Josh Berkus wrote: Eh... I may be showing my status as a non-normal human, but I know exactly what pg_stat_walsender is (it's the view that shows you the status of the WAL senders you've allowed by configuring max_wal_senders0) but I have no idea what pg_stat_replication_sent and pg_stat_replication_received are supposed to be. Why two views? *scratches head in confusion* How about one view, called pg_stat_replication? This would be clear even to newbies, which none of the other view names would ... Wait. We can't do that. We'd be breaking a decades-old tradition of terrible names! ;) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
David E. Wheeler da...@kineticode.com writes: * Prefer convention over configuration The previous idea about the convention is not flying well with the very recent proposal of ALTER EXTENSION ... UPGRADE TO VERSION ..., because it would certainly require that the extension's name include its major version number, like debian is doing for a number of packages. Also, how are PostGIS 1.4 and 1.5 (and 2.0) packaged nowadays? * Not make me do more work that the computer can do No computer will guess reliably which upgrade file to apply given the currently installed version and the newer one, as soon as the same file can get used for more than a single combination of those two strings. I much prefer to avoid shipping that many files, and thinks that even in the worst case where you have to add a setup line per supported upgrade setup, the control file support for that is better. Now I perfectly understand that there's more to this world than my eyes can see, that's why we're talking about alternatives. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking latest timeline in standby mode
On 02.11.2010 07:15, Fujii Masao wrote: On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yeah, that's one approach. Another is to validate the TLI in the xlog page header, it should always match the current timeline we're on. That would feel more robust to me. Yeah, that seems better. I finally got around to look at this. I wrote a patch to validate that the TLI on xlog page header matches ThisTimeLineID during recovery, and noticed quickly in testing that it doesn't catch all the cases I'd like to catch :-(. The problem scenario is this: TLI 1 ---+C---+---Standby . . TLI 2+C---+--- The two horizontal lines represent two timelines. TLI 2 forks off from TLI 1, because of a failover to a not-completely up-to-date standby server, for example. The plus-signs represent WAL segment boundaries and C's represent checkpoint records. Another standby server has replayed all the WAL on TLI 2. Its latest restartpoint is C. The checkpoint records on the different timelines are at the same location, at the beginning of the WAL files - not all that impossible if you have archive_timeout set, for example. Now, if you stop and restart the standby, it will try to recover to the latest timeline, which is TLI 2. But before the restart, it had already replayed the WAL from TLI 1, so it's wrong to replay the WAL from the parallel universe of TLI 2. At the moment, it will go ahead and do it, and you end up with an inconsistent database. I planned to fix that by checking the TLI on the xlog page header, but that alone isn't enough in the above scenario. The TLI on the page headers on timeline 2 are what's expected; the first page on the segment has TLI==1, because it was just forked off from timeline 1, and the subsequent pages have TLI==2, as they should after the checkpoint record. So we have to remember that before the restart, which timeline where we on. We already remember how far we had replayed, that's the minRecoveryPoint we store in the control file, but we have to memorize the timeline along that. On reflection, your idea of checking the history file before replaying anything seems much easier. We'll still need to add the timeline alongside minRecoveryPoint to do the checking, but it's a lot easier to do against the history file. And we can validate the TLIs on page headers against the information from the history file as we read in the WAL. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upgrading Extension, version numbers
On Jan 4, 2011, at 12:05 PM, Dimitri Fontaine wrote: David E. Wheeler da...@kineticode.com writes: * Prefer convention over configuration The previous idea about the convention is not flying well with the very recent proposal of ALTER EXTENSION ... UPGRADE TO VERSION ..., because it would certainly require that the extension's name include its major version number, like debian is doing for a number of packages. No, just the file. Also, how are PostGIS 1.4 and 1.5 (and 2.0) packaged nowadays? Tarballs. * Not make me do more work that the computer can do No computer will guess reliably which upgrade file to apply given the currently installed version and the newer one, as soon as the same file can get used for more than a single combination of those two strings. Why not? Version numbers would have to be part of the file names. The only wrinkle is being able to properly order version numbers, and we could address that by requiring a specific version format. Tom suggested integers; I suggested semantic versions. I much prefer to avoid shipping that many files, and thinks that even in the worst case where you have to add a setup line per supported upgrade setup, the control file support for that is better. Well, for a version that requires no upgrade script, there just wouldn't be one. It's a matter of taste. Now I perfectly understand that there's more to this world than my eyes can see, that's why we're talking about alternatives. You are? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fixing GIN for empty/null/full-scan cases
I've been thinking about how to fix GIN's assorted corner-case problems, as has been discussed several times, most recently here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg00521.php See also http://wiki.postgresql.org/wiki/Todo#GIN There are basically three related issues: 1. GIN doesn't store anything in the index for a NULL item. 2. GIN doesn't store anything in the index for an empty (zero-key) item. 3. GIN can't deal with NULL key values. (An item is a composite value to be indexed, such as a tsvector or array. A key is an individual indexable value, such as a lexeme or array element.) Because of #1 and #2, GIN can't handle full-index scans. This is not because the code can't scan all of the index, but because doing so wouldn't necessarily return a TID for every existing heap row. We have to fix #1 and #2, and then get rid of the prohibition on full-index scans, in order to deal with the complaints that appear in our TODO list. The problem with NULL key values is somewhat less pressing, because most GIN-indexable operators are strict enough to not care about null keys, so we could perhaps just ignore nulls. But I'm inclined to think that it'd be best to fix that now while we're whacking the code and opclass APIs around, rather than probably having to go back for yet another round later. A concrete example of a hit we'll take if we don't index nulls is that array-is-contained-in would have to be treated as a lossy rather than lossless index search, since there'd be no way to tell from the index whether the array item contains any nulls (rendering it not contained in anything). As far as storage in the index goes, NULL key values seem perfectly simple to deal with: just allow a null to get stored for the key value of a GIN index entry. What we discussed doing to fix #1 and #2 was to store a dummy index entry, rather than no entries at all. The least complicated way to do that would be to store a NULL key. That would mean that, for example with integer arrays, all three of these item values would have identical index entries: NULL::int[] '{}'::int[] '{NULL}'::int[] So any searches that need to yield different answers for these cases (ie find some but not all of them) would have to be treated as lossy. That's not necessarily a show-stopper, but I'm inclined to think that it is worth working a bit harder so that we can distinguish them. It's already true that GIN requires special-case code to construct its index entries (look at GinFormTuple). What I'm thinking we could do without too much additional ugliness is store either the key value (for the normal, non-null key case) or an int16 representing a category: 1 = null key value 2 = placeholder for zero-key item 3 = placeholder for null item The index entry's IndexTupleHasNulls flag would be sufficient to distinguish whether a key or a category flag is present, since there are no other potentially-null values in a GIN index entry. There is room in this scheme for more categories if we ever need any, though I can't think of what they'd be. The other sticky problem is how to extend the GIN opclass support function API definitions for all this. I propose the following: compare(): doesn't really need any changes. Just as in btree, we only need to call the key comparison function for non-null keys. The various categories of nulls can have hard-wired comparison behavior. extractValue(): needs an extension so it can return null key values. I propose adding a third argument: Datum *extractValue(Datum inputValue, int32 *nkeys, bool **nullFlags) If the function wants to return any nulls, it has to palloc an array of nkeys bools and store a pointer to it at *nullFlags. We can initialize *nullFlags to NULL (implying no null keys) for backwards compatibility with existing functions that aren't aware of the third argument. In the case of a null item value, we needn't call the function at all, we can just generate the dummy entry directly. Zero-key items can be handled compatibly with the current behavior: if nkeys is returned as zero, we'll generate a dummy entry instead of generating nothing at all. extractQuery(): likewise, needs to be able to return null query element values. I propose adding a sixth argument: Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n, bool **pmatch, Pointer **extra_data, bool **nullFlags) As above, we can initialize *nullFlags to NULL for backwards compatibility. We will continue to assume that a null query value means an unsatisfiable query, so we don't need to be able to call extractQuery with a null input query. We'll keep the current convention that returning nkeys = -1 means an unsatisfiable query while returning zero requests a full index scan. consistent(): needs to be able to deal with possible nulls in the extracted query
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: * Existing GIN indexes are upwards compatible so far as on-disk storage goes, but they will of course be missing entries for empty, null, or null-containing items. Users who want to do searches that should find such items will need to reindex after updating to 9.1. This is the only part of this proposal that bothers me a little bit. It would be nice if the system could determine whether a GIN index is upgraded from 9.0 or earlier and thus doesn't contain these entries - and avoid trying to use the index for these sorts of queries in cases where it might return wrong answers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: * Existing GIN indexes are upwards compatible so far as on-disk storage goes, but they will of course be missing entries for empty, null, or null-containing items. Users who want to do searches that should find such items will need to reindex after updating to 9.1. This is the only part of this proposal that bothers me a little bit. It would be nice if the system could determine whether a GIN index is upgraded from 9.0 or earlier and thus doesn't contain these entries - and avoid trying to use the index for these sorts of queries in cases where it might return wrong answers. I don't think it's really worth the trouble. The GIN code has been broken for these types of queries since day one, and yet we've had only maybe half a dozen complaints about it. Moreover there's no practical way to avoid trying to use the index, since in many cases the fact that a query requires a full-index scan isn't determinable at plan time. The best we could really do is throw an error at indexscan start, and that doesn't seem all that helpful. But it probably wouldn't take much code either, if you're satisfied with that answer. (I'm envisioning adding a version ID to the GIN metapage and then checking that before proceeding with a full-index scan.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
On 05/01/11 04:43, Devrim GÜNDÜZ wrote: On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote: I note that this uninitialized pages with standbys has cropped up from time to time - I wonder if in most/all the cases folk were using Pitrtools? I deployed Pitrtools a lot when I was working for CMD, and I haven't seen any issues with that. It is just a wrapper, nothing else... Note that if you perform the backup step when the system is quiet then there is typically no problem with the standby - it is only when the backup coincides with any level of activity that you can elicit the previously mentioned problems. Obviously the case I have included in the first mail shows up the problem immediately, but in fact it is reasonably hard to trigger and I spent quite a while getting to the state of being able to trigger the uninitialized pages on demand. While Pitrtools it is a wrapper, the use of --exclude=backup_label is incorrect and needs to be removed. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
Josh Berkus j...@postgresql.org writes: So, again, I don't agree that a separate synchrep permission is useful, or warranted. +1 However, your arguments *do* make me backpedal on the issue of having a list of synch rep roles on the master. I can easily imagine a DBA needing to rapidly disable synch rep if replication is failing, without taking the time to log in to several separate standbys, one or more of which might be improperly configured and inaccessible. I can't think of a simpler way to do that than having a synch_rep_roles configuration on the master. That would also handle control issues for the senior DBA, since you'd need superuser access to the master to modify it. What about the HBA here? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
What about the HBA here? Hmmm. That's tempting; an synchronous HBA instead of a GUC? But that doesn't solve the problem of standby #6 is failing, I want to kick it off synch rep. I'd be opposed to having a GUC *and* an HBA. making DBAs set things independantly in two places just frustrates our users. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
Josh Berkus j...@postgresql.org writes: What about the HBA here? Hmmm. That's tempting; an synchronous HBA instead of a GUC? But that doesn't solve the problem of standby #6 is failing, I want to kick it off synch rep. I'd be opposed to having a GUC *and* an HBA. making DBAs set things independantly in two places just frustrates our users. I was just thinking that you could prepend a reject line at the right place in the file. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] can shared cache be swapped to disk?
On Tue, Jan 04, 2011 at 09:51:05AM -0800, Jeff Janes wrote: Correct. The kernel ignores locking requests because it's a great way to DOS a machine. For example, mlock() of large blocks of memory is also not permitted for similar reasons. Does it ignore such requests in general, or only under certain situations? If the latter, do you know what those situations are? Well, not in general, but for shared memory it's ignored (not sure about if you're root). It used to be that shared memory was always locked, which sounds like a great idea, until people started abusing it. So now shared memory is on ethe same footing as other memory. Not sure where I read this, I know it came up several years ago. I think it changed back in 2.0 times. RLIMIT_MEMLOCK exists, it has a small default hard limit, and only root can increase that. If root has gone out of its way to grant the postgres user a higher limit, the kernel should respect that, at least up until the situation become truly desperate. Like I said, not sure about how it works for root. Unfortunately it is hard to know what the kernel considers to be significant memory pressure. My experience (from mostly non-pgsql work) is that kernel has what I would consider enough cache memory to throw away, but for some reason doesn't throw it away but does more counter productive things instead. Possibly. Everyone always considers their memory to be more important than all other memory on the system, but the kernel has a much better idea of what's going on than the user. That doesn't mean it's without fault or couldn't be improved. But if there's a bunch of shared memory not being accessed very often and the kernel thinks it's better used somewhere else, it may be right. Repeatable test cases in this area are really hard. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
On Tue, Jan 4, 2011 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: * Existing GIN indexes are upwards compatible so far as on-disk storage goes, but they will of course be missing entries for empty, null, or null-containing items. Users who want to do searches that should find such items will need to reindex after updating to 9.1. This is the only part of this proposal that bothers me a little bit. It would be nice if the system could determine whether a GIN index is upgraded from 9.0 or earlier and thus doesn't contain these entries - and avoid trying to use the index for these sorts of queries in cases where it might return wrong answers. I don't think it's really worth the trouble. The GIN code has been broken for these types of queries since day one, and yet we've had only maybe half a dozen complaints about it. Moreover there's no practical way to avoid trying to use the index, since in many cases the fact that a query requires a full-index scan isn't determinable at plan time. The best we could really do is throw an error at indexscan start, and that doesn't seem all that helpful. But it probably wouldn't take much code either, if you're satisfied with that answer. (I'm envisioning adding a version ID to the GIN metapage and then checking that before proceeding with a full-index scan.) I'd be satisfied with that answer. It at least makes it a lot more clear when you've got a problem. If this were a more common scenario, I'd probably advocate for a better solution, but the one you propose seems adequate given the frequency of the problem as you describe it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
On 1/4/11 1:49 PM, Tom Lane wrote: I don't think it's really worth the trouble. The GIN code has been broken for these types of queries since day one, and yet we've had only maybe half a dozen complaints about it. Moreover there's no practical way to avoid trying to use the index, since in many cases the fact that a query requires a full-index scan isn't determinable at plan time. Actually, there's been a *lot* of complaining about the GIN issues. It's just that most of that complaining doesn't reach -hackers. The common pattern I've seen in our practice and on IRC is: 1) user has GiST indexes 2) user tries converting them to GIN 3) user gets full index scan errors 4) user switches back and gives up I agree that backwards compatibility should not be a priority; it is sufficient to tell users to reindex. For one thing, anyone who *is* using GIN presently will have written their application code to avoid full index scans. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
I was just thinking that you could prepend a reject line at the right place in the file. Hmmm, that's worth thinking about. How do others feel about this? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1
On Tue, Jan 04, 2011 at 08:31:19PM +0100, Dimitri Fontaine wrote: David Fetter da...@fetter.org writes: Do you plan to have ALTER EXTENSION ... UPGRADE TO VERSION ... , or the more general, ALTER EXTENSION ... ALTER VERSION TO ... ? Well why not, but I'm not sure I understand what you have in mind here. One could imagine that an extension was updated more quickly than PostgreSQL major versions come out, or at least not at the exact same time. I don't exactly see how to install more than one version on the server, at the filesystem level, with the current infrastructure. It's probably not worth worrying about the multiple version issue for 9.1, but it probably *is* for 9.2. The way debian solves it for multi-versions packages, such as PostgreSQL for example, is to add the (major) version number in the package name, so that you have independent packages such as: postgresql-8.3 postgresql-8.4 postgresql-9.0 Are you thinking about doing something similar with extensions? Vaguely. See above. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On 1/4/11 10:18 AM, Jeff Davis wrote: The main drawback here is that only a select group of people will be defining discrete range types at all, because it would require them to define a function first. Perhaps that's for the best, because, (as Tom pointed out) we don't want someone using floats and then specifying a granule of '0.01'. Frankly, I'm still not convinced that *anyone* will really need discrete range types -- as opposed to continuous range types, which I'm already using in production ala temporal. So I'm completely OK with making discrete range types hard to use, as long as continous range types are easy to use. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Wed, Jan 5, 2011 at 04:56, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think pg_stat_replication is better than pg_stat_standby, but I'm still not convinced we shouldn't go with the obvious pg_stat_walsenders. How about pg_stat_replication_activity? If I understood correctly, the view is similar to pg_stat_activity, but displays information about connected standbys rather than regular backends. It's a bit long name, though. The view currently discussed is for *master* servers. We might have some views for replication activity in *standby* servers. So, I'd like to choose consistent and symmetric names for them -- for example, pg_stat_replication_master and pg_stat_replication_standby. I've expected they will be pg_stat_wal_[senders|receivers] when I was writing the patch, but any other better names welcome. However, we have max_wal_senders GUC parameter. So, users still need to know what wal_senders is. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] can shared cache be swapped to disk?
On Tue, Jan 4, 2011 at 2:52 PM, Martijn van Oosterhout klep...@svana.org wrote: On Tue, Jan 04, 2011 at 09:51:05AM -0800, Jeff Janes wrote: Correct. The kernel ignores locking requests because it's a great way to DOS a machine. For example, mlock() of large blocks of memory is also not permitted for similar reasons. Does it ignore such requests in general, or only under certain situations? If the latter, do you know what those situations are? Well, not in general, but for shared memory it's ignored (not sure about if you're root). It used to be that shared memory was always locked, which sounds like a great idea, until people started abusing it. So now shared memory is on ethe same footing as other memory. Not sure where I read this, I know it came up several years ago. I think it changed back in 2.0 times. RLIMIT_MEMLOCK exists, it has a small default hard limit, and only root can increase that. If root has gone out of its way to grant the postgres user a higher limit, the kernel should respect that, at least up until the situation become truly desperate. Like I said, not sure about how it works for root. I mean that root can increase it for *other* users. I've done the experiment on kernel 2.6.31.5, as a non-root user, and it looks like the kernel is respecting the SHM_LOCK. On a 2GB machine I set shared_buffers to 1200MB and run pgbench -S with scale of 80, and run it until it seems to be fully cached. (top doesn't distinguish between memory that has been requested but never accessed, versus memory that has been accessed and then truly swapped out to disk. So unless you first let it run to steady-state before applying pressure, it is hard to interpret the results.) Then I start up a Perl program that just perpetually loops through ~1.1 GB of memory. If I SHM_LOCK postgres's memory, then only perl starts swapping. If I don't lock it, then both perl and postgres start swapping. Obviously there is a lot of territory not covered here, but it looks like locking memory is respected in general. It still doesn't let you benefit from using shared_buffers that are a large portion of RAM (other than in silly test cases), and I don't know why that is, but I'm now pretty sure it isn't due to swapping out the shared memory. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
Simon Riggs wrote: Based upon that series of conversations, I've reworked the design so that there is (currently) only a single standby offering sync rep at any one time. Other standbys can request to be sync standbys but they only become the sync standby if the first one fails. Which was simple to do and bridges the challenges of an exactly identified sync standby and the fragility of too closely specifying the config. That seems like a good enough starting point to cover a lot of cases. Presuming the two servers each at two sites config that shows up in a lot of these discussions, people in the I need sync to a remote spot can get that, and if that site is unavailable for long enough to be kicked out they'll smoothly degrade to sync on a secondary local copy. And those who want high performance local sync and best-effort for the remote site can configure for that too, and if the local secondary dies they'll just degrade to the slow remote commits. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
Kevin Grittner wrote: Greg Smith wrote: I could see shipping this with the automatic heavy LOCK TABLE in there. How would you handle or document behavior in REPEATABLE READ isolation? The lock doesn't do much good unless you acquire it before you get your snapshot, right? Hand-wave and hope you offer a suggested implementation? I haven't gotten to thinking about this part just yet--am still assimilating toward a next move after the pleasant surprise that this is actually working to some degree now. You're right that turning the high-level idea of just lock the table actually has to be mapped into exact snapshot mechanics and pitfalls before moving in that direction will get very far. I'm probably not the right person to answer just exactly how feasibile that is this week though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
David Fetter wrote: How about implementing an UPSERT command as take the lock, do the merge? That way, we'd have both the simplicity for the simpler cases and a way to relax consistency guarantees for those who would like to do so. Main argument against is that path leads to a permanent non-standard wart to support forever, just to work around what should be a short-term problem. And I'm not sure whether reducing the goals to only this actually improves the ability to ship something in the near term too much. Many of the hard problems people are bothered by don't go away, it just makes deciding which side of the speed/complexity trade-off you're more interested in becomes more obvious. What I've been advocating is making that decision go away altogether by only worrying about the simple to use and slow path for now, but that's a highly debatable viewpoint I appreciate the resistence to, if it's possible to do at all. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] making an unlogged table logged
Somebody asked about this on Depesz's blog today, and I think it's come up here before too, so I thought it might be worth my writing up a few comments on this. I don't think I'm going to have time to work on this any time soon, but if someone else wants to work up a patch, I'm game to review. I think it'd clearly be a good feature. Generally, to do this, it would be necessary to do the following things (plus anything I'm forgetting): 1. Take an AccessExclusiveLock on the target table. You might think that concurrent selects could be allowed, but I believe that's not the case. Read on. 2. Verify that there are no foreign keys referencing other unlogged tables, because if that were the case then after the change we'd have a permanent table referencing an unlogged table, which would violate referential integrity. (Note that unlogged referencing permanent is OK, but permanent referencing unlogged is a no-no, so what matters when upgrading is outbound foreign keys.) 3. Write out all shared buffers for the target table, and drop them. This ensures that there are no buffers floating around for the target relation that are marked BM_UNLOGGED, which would be a bad thing. Or maybe it's possible to just clear the BM_UNLOGGED flag, instead of dropping them. This is the step that makes me think we need an access exclusive lock - otherwise, somebody else might read in a buffer and, seeing that the relation is unlogged (which is true, since we haven't committed yet), mark it BM_UNLOGGED. 4. fsync() any segments of the target relation - of any fork except that init fork - that might have dirty pages not on disk. 5. Arrange for the appropriate file deletions at commit or abort, by updating pendingDeletes. On commit, we want to delete the init fork for the table and all its indexes. On abort, we want to delete everything else, but only for pretend; that is, the abort record should reflect the deletions since they'll need to happen on any standbys, but we shouldn't actually perform them on the master since we don't want to obliterate the contents of the table for no reason. There's a subtle problem here I'm not quite sure how to deal with: what happens if we *crash* without writing an abort record? It seems like that could leave a stray file around on a standby, because the current code only cleans things up on the standby at the start of recovery; to make this bullet-proof, I suppose it'd need to repeat that every time a crash happens on the master, but I don't know how to do that. Note also that if wal_level is minimal, then we need only worry about the commit case; the abort case can be a no-op. 6. If wal_level != minimal, XLOG every page of every fork except the init fork, for both the table and the associated indexes. (Note that this step also requires an AccessExclusiveLock rather than some weaker lock, because of the arbitrary rule that only AccessExclusiveLocks are sent to standbys. If we held only ShareRowExclusiveLock on the master, for example, a Hot Standby backend might read the table while it's only been half-copied.) 7. Update pg_class.relpersistence from 'u' to 'p', for both the table and the associated indexes. Going the other direction ought to be possible too, although it seems somewhat less useful. For that, you'd need to flip around the check in step #2 (i.e. check for a reference FROM a permanent table), perform step #3, skip step #4, do step #5 backwards (create and log init forks, arranging for them to be removed on abort - this too has an issue with crashes that don't leave abort records behind); and arrange for the rest of the forks to be removed on commit on any standby without doing it on the master), skip step #6, and do step #7 backwards. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - core functionality
On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote: Hanada-san, can you rebase the fdw_scan patch over what I committed and post an updated version ASAP? It'd be better for Heikki or Tom to work on that part of this than me, since they have a better understanding of the executor than I do, but I'm sure that they will not want to work from the previously posted patches as the changes I made are fairly extensive. Is anyone working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication as a separate permissions
On Mon, Jan 3, 2011 at 5:50 PM, Magnus Hagander mag...@hagander.net wrote: On Mon, Jan 3, 2011 at 17:23, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 3, 2011 at 11:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On the other hand, the REPLICATION privilege is denying you the right to perform an operation *even though you already are authenticated as a superuser*. I don't think there's anywhere else in the system where we allow a privilege to non-super-users but deny that same privilege to super-users, and I don't think we should be starting now. You might want to reflect on rolcatupdate a bit before asserting that there are no cases where privileges are ever denied to superusers. Oh, huh. I wasn't aware of that. However, that precedent would suggest that the default should be to grant the replication bit to superusers. Yes it would. Just to be clear: are we saying that CREATE ROLE foo SUPERUSER should grant both superuser and replication, as well as the default postgres user also having replication as well? I think that's what we're saying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] making an unlogged table logged
On Tue, Jan 4, 2011 at 7:41 PM, Robert Haas robertmh...@gmail.com wrote: Somebody asked about this on Depesz's blog today, and I think it's come up here before too, so I thought it might be worth my writing up a few comments on this. I don't think I'm going to have time to work on this any time soon, but if someone else wants to work up a patch, I'm game to review. I think it'd clearly be a good feature. Generally, to do this, it would be necessary to do the following things (plus anything I'm forgetting): 1. Take an AccessExclusiveLock on the target table. You might think that concurrent selects could be allowed, but I believe that's not the case. Read on. 2. Verify that there are no foreign keys referencing other unlogged tables, because if that were the case then after the change we'd have a permanent table referencing an unlogged table, which would violate referential integrity. (Note that unlogged referencing permanent is OK, but permanent referencing unlogged is a no-no, so what matters when upgrading is outbound foreign keys.) 3. Write out all shared buffers for the target table, and drop them. This ensures that there are no buffers floating around for the target relation that are marked BM_UNLOGGED, which would be a bad thing. Or maybe it's possible to just clear the BM_UNLOGGED flag, instead of dropping them. This is the step that makes me think we need an access exclusive lock - otherwise, somebody else might read in a buffer and, seeing that the relation is unlogged (which is true, since we haven't committed yet), mark it BM_UNLOGGED. 4. fsync() any segments of the target relation - of any fork except that init fork - that might have dirty pages not on disk. 5. Arrange for the appropriate file deletions at commit or abort, by updating pendingDeletes. On commit, we want to delete the init fork for the table and all its indexes. On abort, we want to delete everything else, but only for pretend; that is, the abort record should reflect the deletions since they'll need to happen on any standbys, but we shouldn't actually perform them on the master since we don't want to obliterate the contents of the table for no reason. There's a subtle problem here I'm not quite sure how to deal with: what happens if we *crash* without writing an abort record? It seems like that could leave a stray file around on a standby, because the current code only cleans things up on the standby at the start of recovery; to make this bullet-proof, I suppose it'd need to repeat that every time a crash happens on the master, but I don't know how to do that. Note also that if wal_level is minimal, then we need only worry about the commit case; the abort case can be a no-op. 6. If wal_level != minimal, XLOG every page of every fork except the init fork, for both the table and the associated indexes. (Note that this step also requires an AccessExclusiveLock rather than some weaker lock, because of the arbitrary rule that only AccessExclusiveLocks are sent to standbys. If we held only ShareRowExclusiveLock on the master, for example, a Hot Standby backend might read the table while it's only been half-copied.) 7. Update pg_class.relpersistence from 'u' to 'p', for both the table and the associated indexes. Going the other direction ought to be possible too, although it seems somewhat less useful. For that, you'd need to flip around the check in step #2 (i.e. check for a reference FROM a permanent table), perform step #3, skip step #4, do step #5 backwards (create and log init forks, arranging for them to be removed on abort - this too has an issue with crashes that don't leave abort records behind); and arrange for the rest of the forks to be removed on commit on any standby without doing it on the master), skip step #6, and do step #7 backwards. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company A couple thoughts: 1. Could the making a table logged be a non-exclusive lock if the ALTER is allowed to take a full checkpoint? 2. Unlogged to logged has giant use case. 3. In MySQL I have had to ALTER tables to engine BLACKHOLE because they held data that was not vital, but the server was out of IO. Going logged - unlogged has a significant placed, I think. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - core functionality
On Tue, 4 Jan 2011 22:16:26 -0500 Robert Haas robertmh...@gmail.com wrote: On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote: Hanada-san, can you rebase the fdw_scan patch over what I committed and post an updated version ASAP? It'd be better for Heikki or Tom to work on that part of this than me, since they have a better understanding of the executor than I do, but I'm sure that they will not want to work from the previously posted patches as the changes I made are fairly extensive. Is anyone working on this? Sorry for late replyl, I'm working on this item. I would post rebased fdw_scan patch soon. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] making an unlogged table logged
-- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company A couple thoughts: 1. Could the making a table logged be a non-exclusive lock if the ALTER is allowed to take a full checkpoint? If possible, that would certainly be better. If the bgwriter is doing what it is supposed to, it would be relatively painless. 2. Unlogged to logged has giant use case. Agreed. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - core functionality
On Sat, 1 Jan 2011 23:54:05 -0500 Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 27, 2010 at 10:16 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas robertmh...@gmail.com wrote: I'm working on getting a first chunk of this committed. OK, here's the patch. I've now committed a version of this with a bunch of further revisions, corrections, and cleanup. It looks to me as though this patch was written based on the 9.0 code and not thoroughly updated for some of the 9.1 changes, but I think I cleaned most of that up. With a patch of this size, I am sure there are a few things I overlooked, so please point 'em out and I'll try to fix them promptly. While testing the commit, I found that CREATE FOREIGN TABLE requires unnecessary USAGE privilege on the FOREIGN DATA WRAPPER. SQL/MED standard requires only USAGE on the SERVER as follows. quote 1) If foreign table definition is contained in an SQL-client module, then the enabled authorization identifiers shall include A. 2) The applicable privileges shall include the USAGE privilege on the foreign-server identified by foreign server name. 3) Additional privileges, if any, necessary to execute foreign table definition are implementation-defined. /quote Sorry, this problem comes from original patch. OTOH, the document about this specification which is written in GRANT page is correct. Regards, -- Shigeru Hanada 20110105-no_fdw_check.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Tue, 2011-01-04 at 16:45 -0800, Josh Berkus wrote: On 1/4/11 10:18 AM, Jeff Davis wrote: The main drawback here is that only a select group of people will be defining discrete range types at all, because it would require them to define a function first. Perhaps that's for the best, because, (as Tom pointed out) we don't want someone using floats and then specifying a granule of '0.01'. Frankly, I'm still not convinced that *anyone* will really need discrete range types Well, *need* is a standard that can never be met. But with something like a date range, it's very possible that a discrete version matches the real-world problem more closely than a continuous one. If you use only continuous ranges, then be careful to stick with exactly one convention, or you will likely get wrong results (I think this point has already been established). That sounds easy, but consider: * If you want to know whether two ranges are adjacent (a common requirement), then you need to use [ ) or ( ]. * If you need to map a single point into a range, the only thing that makes sense is [ ]. * If your query contains current_date, you'll probably want ranges that are either in ( ] or [ ] form. * If you are mixing data sets, they may use different conventions. You can work around all of these problems by making the query more complex (and more error-prone). But I wouldn't like to give up on discrete ranges for types where it really makes sense (dates, IPs, integers). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We need to log aborted autovacuums
Josh Berkus wrote: I've been trying to diagnose in a production database why certain tables never get autovacuumed despite having a substantial % of updates. The obvious reason is locks blocking autovacuum from vacuuming the table ... Missed this dicussion when it popped up but have plenty to say about it now. What I do here is look for such anomolies using pg_stat_user_tables, that the dead rows number has exceeded the vacuum threshold. That comparison is logged in the code at DEBUG3: elog(DEBUG3, %s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f), NameStr(classForm-relname), vactuples, vacthresh, anltuples, anlthresh); But a rough computation isn't too hard to derive in a report, so long as you haven't customized per-table parameters. As you suggested in your message, the really bad cases here stick out a whole lot. If you pay the slightest amount of attention to the dead row percentages they jump right out at you. This all works easily on any version back to 8.3. Not having as much relevant data stored in pg_stat_user_tables makes the problem cases less obvious to spot in older versions. If I start seeing these badly maintained tables and suspect locking is getting in the way, I then dump traces from pg_locks+pg_stat_activity often enough that I can estimate how often someone has an interfering lock and what they're doing. Should the log level on this message go up from DEBUG3? I could see rewriting it so that it logs at DEBUG1 instead when Log_autovacuum_min_duration is set *and* when the trigger threshold is crossed, and at DEBUG3 the rest of the time. Given you can derive this with a bit of work in userland, I don't see this even being justified as an INFO or LOG level message. Anytime I can script a SQL-level monitor for something that's easy to tie into Nagios or something, I greatly prefer that to log file scraping for it anyway. What I'd like to do is add some logging code to autovacuum.c so that if log_autovacuum is any value other than -1, failure to vacuum due to locks gets logged. Does this make sense? The general idea is interesting and probably more productive for the situation you theorize is happening then messing with the logging discussed above. But that's not where the code needs to go--the lock isn't opened until much further down the function call stack. Attached quickie and only tested for compilation patch probably does what you want here. Since this would eliminate the messy follow-up step I sometimes have gone through, dumping pg_locks data to confirm or rule out locking issues messing with AV processing, I can see some potential that it may have simplified situations I've ran into in the past. And it's not out of line with the logging verbosity of similar failure mode tests that follow it. Right now failure to acquire a lock is just not considered a log-worthy issue, and I agree that it's worth considering whether it should be. If you could gather more info on whether this logging catches the problem cases you're seeing, that would really be the right test for the patch's usefulness. I'd give you solid 50/50 odds that you've correctly diagnosed the issue, and knowing for sure would make advocating for this logging a pretty easy sell to me at least. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 2f68df4..2124e25 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** vacuum_rel(Oid relid, VacuumStmt *vacstm *** 851,856 --- 851,864 { PopActiveSnapshot(); CommitTransactionCommand(); + + if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) + { + ereport(INFO, + (errmsg(skipping \%s\ --- cannot open or obtain lock, + RelationGetRelationName(onerel; + } + return; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
On Tue, Jan 04, 2011 at 09:27:10PM -0500, Greg Smith wrote: David Fetter wrote: How about implementing an UPSERT command as take the lock, do the merge? That way, we'd have both the simplicity for the simpler cases and a way to relax consistency guarantees for those who would like to do so. Main argument against is that path leads to a permanent non-standard wart to support forever, just to work around what should be a short-term problem. And I'm not sure whether reducing the goals to only this actually improves the ability to ship something in the near term too much. I think I haven't communicated clearly what I'm suggesting, which is that we ship with both an UPSERT and a MERGE, the former being ugly, crude and simple, and the latter festooned with dire warnings about isolation levels and locking. If shipping with a wart, as you term it, isn't acceptable, then I'd advocate for going with just MERGE and documenting it inside and out, including one or more clearly written UPSERT and/or REPLACE INTO recipes. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers