Re: [HACKERS] [GENERAL] How do I bump a row to the front of sort efficiently
On Mon, Feb 2, 2015 at 1:16 AM, Sam Saffron sam.saff...@gmail.com wrote: However, the contortions on the above query make it very un-ORM friendly as I would need to define a view for it but would have no clean way to pass limits and offsets in. This is why ORMs are bad. They make hard problems *much* harder, and the only benefit is that they maybe make easy problems a little quicker. The cost/savings is *heavily* skewed toward the cost, since there's no upper bound on the cost and there is a pretty small lower bound on the savings. Micro-ORMs tend to do a better job of not shielding you from (or rather, getting in the way of) the SQL while still providing some good result-to-object translation. Whether even that is necessary depends on your language, though. (For example, in Python, psycopg2 has a built in way of spitting out namedtuples, which means you get result-to-object translation out of the box. That makes even a micro-ORM pretty unnecessary. On the other hand, a micro-ORM that does this well without blocking you from the SQL, such as PetaPOCO, is a boon in .NET.) If you can, your best bet would probably be to find a way to get your ORM to execute raw SQL (with good parametrization to prevent injection attacks) and be done with it. It took me way too much experience fighting with an ORM on complicated queries to realize that.
Re: [HACKERS] Fetch zero result rows when executing a query?
On 2/3/15 5:26 AM, Shay Rojansky wrote: Sorry if this has been asked before, couldn't find any mention... I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to find a way to execute a query but without fetching any rows. The Execute message has a maximum result-row count, but zero is documented to mean fetch all rows. The use case would be sending a query which might modify or might not (e.g. UPDATE), but we know that the user is uninterested in any result row. My current workaround is to specify maxrows=1, was wondering if I missed a better alternative. You might be able to add something like WHERE FALSE to the RETURNING clause, but I agree with Andres; this seems like premature optimization. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On 2/3/15 5:08 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 2/3/15 9:20 AM, Tom Lane wrote: Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] VACUUM puts the options before the table name, so ISTM it'd be best to keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or REINDEX {INDEX | ...} (options). Well, I really really don't like the first of those. IMO the command name is REINDEX INDEX etc, so sticking something in the middle of that is bogus. Actually, is there a reason we can't just accept all 3? Forcing people to remember exact ordering of options has always struck me as silly. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump's aborted transactions
Stephen Frost sfr...@snowman.net writes: All, We recently had a client complain that check_postgres' commitratio check would alert about relatively unused databases. As it turns out, the reason for this is because they automate running pg_dump against their databases (surely a good thing..), but pg_dump doesn't close out its transaction cleanly, leading to rolled back transactions. At first blush, at least, this strikes me as an oversight which we should probably fix and possibly backpatch. No, somebody should fix check_postgres to count rollbacks as well as commits as activity (as they obviously are). This is not an oversight, it's 100% intentional. The reason pg_dump aborts rather than commits is to make entirely sure that it does not commit any changes to the database. I would be against removing that safety feature, considering that pg_dump is typically run as superuser. We have frequently worried about security exploits that involve hijacking superuser activities, and this behavior provides at least a small increment of safety against such holes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table description in the data file (Re: pg_rawdump)
On 2/3/15 1:43 AM, Stephen R. van den Berg wrote: In relation to the talk and discussions at FOSDEM regarding helping data recovery, I searched the archives for the old thread after I performed my last recovery; for reference: http://www.postgresql.org/message-id/20101019201223.ga15...@cuci.nl I haven't checked yet if the proposed space there is still available in the current disk format, but it might serve as a starter and reminder to get the discussion going once more. Rather than trying to wedge this into a heap page, ISTM it'd be better to use a fork. Presumably if you're storing regular tuples that have the essential data from pg_class, pg_attribute (and maybe pg_type). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] How do I bump a row to the front of sort efficiently
On Tue, Feb 3, 2015 at 9:33 PM, BladeOfLight16 bladeofligh...@gmail.com wrote: This is why ORMs are bad. They make hard problems *much* harder, and the only benefit is that they maybe make easy problems a little quicker. The cost/savings is *heavily* skewed toward the cost, since there's no upper bound on the cost and there is a pretty small lower bound on the savings. Micro-ORMs tend to do a better job of not shielding you from (or rather, getting in the way of) the SQL while still providing some good result-to-object translation. Whether even that is necessary depends on your language, though. (For example, in Python, psycopg2 has a built in way of spitting out namedtuples, which means you get result-to-object translation out of the box. That makes even a micro-ORM pretty unnecessary. On the other hand, a micro-ORM that does this well without blocking you from the SQL, such as PetaPOCO, is a boon in .NET.) If you can, your best bet would probably be to find a way to get your ORM to execute raw SQL (with good parametrization to prevent injection attacks) and be done with it. It took me way too much experience fighting with an ORM on complicated queries to realize that. Er, *pretty small upper bound on the savings.
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Jim Nasby jim.na...@bluetreble.com writes: On 2/3/15 5:08 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: VACUUM puts the options before the table name, so ISTM it'd be best to keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or REINDEX {INDEX | ...} (options). Well, I really really don't like the first of those. IMO the command name is REINDEX INDEX etc, so sticking something in the middle of that is bogus. Actually, is there a reason we can't just accept all 3? Forcing people to remember exact ordering of options has always struck me as silly. And that's an even worse idea. Useless flexibility in syntax tends to lead to unfortunate consequences like having to reserve keywords. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump's aborted transactions
All, We recently had a client complain that check_postgres' commitratio check would alert about relatively unused databases. As it turns out, the reason for this is because they automate running pg_dump against their databases (surely a good thing..), but pg_dump doesn't close out its transaction cleanly, leading to rolled back transactions. At first blush, at least, this strikes me as an oversight which we should probably fix and possibly backpatch. Thoughts? Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] pgsql: Add API functions to libpq to interrogate SSL related stuff.
On 02/04/2015 02:23 AM, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@iki.fi writes: Add API functions to libpq to interrogate SSL related stuff. This patch is one large brick shy of a load: it creates exported libpq functions but fails to ensure they always exist. That's why jacana is unhappy; though TBH I'm astonished that any non-ssl-enabled builds are passing. Apparently missing library functions are less of a hard error on Linux than they ought to be. Yeah, that's surprising. I don't see any option in ld man page to make it warn either. This also shows that we don't have any regression tests for this function. It's trivial, so I'm not worried about that, but in general it would be good to have a regression tests suite specifically for libpq. There are probably many other things that are not exercised by psql. I think probably the exported functions need to be defined in fe-exec.c or fe-connect.c, with bodies along the lines of #ifdef USE_OPENSSL call OpenSSL-specific function #else return NULL #endif (or whatever's appropriate when no SSL support). We do want these functions to exist even in non-SSL-enabled builds. Sure. There are dummy versions of all the other SSL-related functions, I just missed PQsslAttributes. The OpenSSL-versions are in fe-secure-openssl.c, and the dummy ones are in fe-secure.c, within a #ifndef USE_SSL block. Fixed now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/02/2015 04:21 PM, Andres Freund wrote: Hi, On 2015-02-02 08:36:41 -0500, Robert Haas wrote: Also, I'd like to propose that we set the default value of max_checkpoint_segments/checkpoint_wal_size to something at least an order of magnitude larger than the current default setting. +1 I don't agree with that principle. I wouldn't mind increasing it a little bit, but not by an order of magnitude. For better or worse, *all* our defaults are tuned toward small systems, and so that PostgreSQL doesn't hog all the resources. We shouldn't make an exception for this. I think we need to increase checkpoint_timeout too - that's actually just as important for the default experience from my pov. 5 minutes often just unnecessarily generates FPWs en masse. I'll open the bidding at 1600MB (aka 100). Fine with me. I wouldn't object to raising it a little bit, but that's way too high. It's entirely possible to have a small database that generates a lot of WAL. A table that has only a few rows, but is updated very very frequently, for example. And checkpointing such a database is quick too, so frequent checkpoints are not a problem. You don't want to end up with 1.5 GB of WAL on a 100 MB database. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
Hi, I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. Previously there have been arguments against changing the default of wal_level because it'd mean the regression tests wouldn't exercise minimal anymore. That might be true, but then right now we just don't exercise the more complex levels. If we're really concerned we can just force a different value during the tests, just as we do for prepared xacts. Comments? Additionally, more complex and further into the future, I wonder if we couldn't also get rid of wal_level = logical by automatically switching to it whenever logical slots are active. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On Tue, Feb 3, 2015 at 9:09 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Sawada Masahiko sawada.m...@gmail.com writes: On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier michael.paqu...@gmail.com wrote: Now, I think that it may be better to provide the keyword VERBOSE before the type of object reindexed as REINDEX [ VERBOSE ] object. Actually, my first WIP version of patch added VERBOSE word at before type of object. I'm feeling difficult about that the position of VERBOSE word in REINDEX statement. The way that FORCE was added to REINDEX was poorly thought out; let's not double down on that with another option added without any consideration for future expansion. I'd be happier if we adopted something similar to the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in parentheses. I understood. I'm imagining new REINDEX syntax are followings. - REINDEX (INDEX, VERBOSE) hoge_idx; - REINDEX (TABLE) hoge_table; i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Thought? I don't think the keyworks INDEX, TABLE, SCHEMA, SYSTEM and DATABASE are options... they are part of the command IMHO. Maybe something like: REINDEX { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } [( [ FORCE ], [VERBOSE] ) ] name; And maintain the old syntax for compatibility of course. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Release note bloat is getting out of hand
On Tue, Feb 03, 2015 at 09:08:45AM -0500, Andrew Dunstan wrote: On 02/03/2015 08:55 AM, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 02/02/2015 05:39 PM, Peter Eisentraut wrote: I share the sentiment that the release notes *seem* too big, but the subsequent discussion shows that it's not clear why that's really a problem. Exactly what problem are we trying to fix? At a rough count of lines, the release notes for unsupported versions are about 18% of documentation overall (47K out of 265K lines). So they're not insubstantial. Compared to the total size of the tarball, though ... It would not make that much of a difference in tarball size, agreed. It *would* make a difference in the build time and output size of the SGML docs --- as I mentioned at the outset, the release notes currently account for 25% of the SGML source linecount. I run `make -s -j4 world` on my i7 fairly often, and it is often the doc build that I wind up waiting for at the end. I realize this is slightly OT, but I wonder if it might be worth having targets that build and install everything but the docs. That'd be great. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Mon, Feb 2, 2015 at 9:10 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-12 16:11:58 -0500, Robert Haas wrote: On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut pete...@gmx.net wrote: If REINDEX cannot work without an exclusive lock, we should invent some other qualifier, like WITH FEWER LOCKS. What he said. But more to the point why, precisely, can't this work without an AccessExclusiveLock? And can't we fix that instead of setting for something clearly inferior? So, here's an alternative approach of how to get rid of the AEL locks. They're required because we want to switch the relfilenodes around. I've pretty much no confidence in any of the schemes anybody has come up to avoid that. So, let's not switch relfilenodes around. I think if we should instead just use the new index, repoint the dependencies onto the new oid, and then afterwards, when dropping, rename the new index one onto the old one. That means the oid of the index will change and some less than pretty grovelling around dependencies, but it still seems preferrable to what we're discussing here otherwise. Does anybody see a fundamental problem with that approach? I'm not sure whether that will work out, but it seems worth a try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
Andres Freund and...@2ndquadrant.com writes: Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. I always thought the reason for defaulting to minimal was performance. I'd like to see proof that the impact of wal_level = hot_standby is negligible before we consider doing this. The argument that having to change one more GUC is an undue burden while configuring hot standby seems ridiculous from here. HS is not nearly push the EASY button and you're done, and this change wouldn't make it so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On Mon, Feb 2, 2015 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 2, 2015 at 3:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: So I'm bemused by Robert's insistence that he wants that format to support searches. As I said, I find it far more convenient to search the output of git log and/or src/tools/git_changelog --- I keep text files of those around for exactly that purpose. I normally search in one of two ways. Sometimes a grep the sgml; other times, I go to, say, http://www.postgresql.org/docs/devel/static/release-9-4.html and then edit the URL to take me back to 9.3, 9.2, 9.1, etc. It's true that 'git log' is often the place to go searching for stuff, but there are times when it's easier to find out what release introduced a feature by looking at the release notes, and it's certainly more useful if you want to send a link to someone who is not git-aware illustrating the results of your search. Well, maybe I'm the only one who is doing this and it's not worth worrying about it just for me. But I do it, all the same. I'm not out to take away a feature you need. I'm just wondering why it has to be supported in exactly the way it's done now. Wouldn't a separately maintained release-notes-only document serve the purpose fine? Well, I'd like to have all the release notes on the web site as they are now, and I'd like to have all the SGML in the build tree as it is now. If someone wants to create a make docs-fast target that omits older release notes, that's obviously not a problem. What I don't want to do is have information that is easily available now (put the right two digits into the URL bar and you are done) suddenly require more thought to get at (OK, so back through 9.0 is on the web site, and then there's this other place you can go to get the older stuff, if you can remember where it is). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/02/2015 04:21 PM, Andres Freund wrote: On 2015-02-02 08:36:41 -0500, Robert Haas wrote: Also, I'd like to propose that we set the default value of max_checkpoint_segments/checkpoint_wal_size to something at least an order of magnitude larger than the current default setting. +1 I don't agree with that principle. I wouldn't mind increasing it a little bit, but not by an order of magnitude. Especially without either confirming that this effect is no longer present, or having an explanation for it: http://www.postgresql.org/message-id/4A44E58C022500027FCF@gw. wicourts.gov Note that Greg Smith found the same effect on a machine without any write caching, which shoots down my theory, at least on his machine: http://www.postgresql.org/message-id/flat/4bccdad5.3040...@2ndquadrant.com#4bccdad5.3040...@2ndquadrant.com -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about to have relnamespace and relrole?
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Most of OID types has reg* OID types. Theses are very convenient when looking into system catalog/views, but there aren't OID types for userid and namespace id. What do you think about having these new OID types? I'm not really excited about that. That line of thought would imply that we should have reg* types for every system catalog, which is surely overkill. The existing reg* types were chosen to handle the cases where objects have schema-qualified (and/or type-overloaded) names so that correct lookup is not merely a matter of (select oid from pg_foo where name = 'bar') or vice versa. I think the policy is, or should be, that we create reg* types for exactly the cases where lookup isn't that simple. In particular, both of the cases you mention are hardly new. They existed when we made the current set of reg* types and were consciously not added then. SELECT relnamespace::regnamespace, relname, relowner::regrole FROM pg_class WHERE relnamespace = 'public'::regnamespace; Two reasons this isn't terribly compelling are (1) it's creating a join in a place where the planner can't possibly see it and optimize it, and (2) you risk MVCC anomalies because the reg* output routines would not be using the same snapshot as the calling query. We already have problem (2) with the existing reg* functions so I'm not that excited about doubling down on the concept. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On 2/3/15 4:04 PM, David Fetter wrote: On Mon, Feb 02, 2015 at 08:56:19PM -, Greg Sabino Mullane wrote: Robert Haas wrote: but there are times when it's easier to find out what release introduced a feature by looking at the release notes, and it's certainly more useful if you want to send a link to someone who is not git-aware illustrating the results of your search. Well, maybe I'm the only one who is doing this and it's not worth worrying about it just for me. But I do it, all the same. I do this *all the time*. Please don't mess with the release notes. Except to put them all on one page for easy searching. That would be awesome. Truly awesome. When supporting older versions, being able to find precisely when a feature was introduced in a single search through a file, or find in page, for those using web browsers, would really smooth off some burrs. And now that we're on the subject of ponies, it would be nice if the relevant git hashes were included as well. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 02/03/2015 03:44 AM, Jim Nasby wrote: [snip] The alternative could be some long LOB (HugeOBject?) using the equivalent to serial8 whereas regular LOBs would use serial4. Well, it depends on how we did this. We could (for example) add a field to pg_class that determines what type to use for toast pointers; OID, int, or bigint. That could then be taken into account in the *toast* functions. But as others have pointed out, we haven't even had any real complaints about toast using OIDs as being an issue until now, so I think it's premature to start messing with this. At most it's just something to keep in mind so we don't preclude doing this in the future. A patch creating those HOBs (Huge Objects) might well make sense *after* the sequence refactoring got merged. Removing the bottleneck due to the OID allocator for this use case will be definitively welcome (I don't dare to code that just yet, but here's hoping someone will step in O:-) BTW, regarding the size of what gets toasted; I've often thought it would be useful to allow a custom size limit on columns so that you could easily force data to be toasted if you knew you were very unlikely to access it. Basically, a cheap form of vertical partitioning. Hmmm alter column set storage external / set storage extended ? From http://www.postgresql.org/docs/9.4/static/sql-altertable.html : ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } This would do what you described, right? HTH, / J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables
Matthew Kelly mke...@tripadvisor.com writes: However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids. I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesnât protect you from this potential problem. That may be a hazard, but ... That being said Iâd be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs. ... this fix would actually make things enormously worse. With the single counter feeding all tables, you at least have a reasonable probability that there are not enormously long runs of consecutive OIDs in any one toast table. With a sequence per table, you are nearly guaranteed that there are such runs, because inserts into other tables don't create a break. (This effect is also why you're wrong to claim that partitioning can't fix it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On Mon, Feb 02, 2015 at 08:56:19PM -, Greg Sabino Mullane wrote: Robert Haas wrote: but there are times when it's easier to find out what release introduced a feature by looking at the release notes, and it's certainly more useful if you want to send a link to someone who is not git-aware illustrating the results of your search. Well, maybe I'm the only one who is doing this and it's not worth worrying about it just for me. But I do it, all the same. I do this *all the time*. Please don't mess with the release notes. Except to put them all on one page for easy searching. That would be awesome. Truly awesome. When supporting older versions, being able to find precisely when a feature was introduced in a single search through a file, or find in page, for those using web browsers, would really smooth off some burrs. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Sawada Masahiko sawada.m...@gmail.com writes: On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: The way that FORCE was added to REINDEX was poorly thought out; let's not double down on that with another option added without any consideration for future expansion. I'd be happier if we adopted something similar to the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in parentheses. I understood. I'm imagining new REINDEX syntax are followings. - REINDEX (INDEX, VERBOSE) hoge_idx; - REINDEX (TABLE) hoge_table; i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] option := FORCE | VERBOSE We'd still keep the historical syntax where you can write FORCE outside parens, but it'd be deprecated. Where to insert the parenthesized option list is a judgment call, but I'd lean to keeping it at the end where FORCE used to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unnecessary pointer-NULL checks in pgp-pgsql.c
On Sun, Feb 1, 2015 at 9:14 PM, Michael Paquier michael.paqu...@gmail.com wrote: Coverity is pointing out that we are doing pointer-NULL checks on things that cannot be NULL in decrypt_internal(): out: - if (src) - mbuf_free(src); - if (ctx) - pgp_free(ctx); + Assert(ctx != NULL src != NULL dst != NULL); + mbuf_free(src); + pgp_free(ctx); if (err) { px_set_debug_handler(NULL); - if (dst) - mbuf_free(dst); + mbuf_free(dst); src, dst and ctx are created respectively from mbuf_create_from_data, mbuf_create and pgp_init which never return NULL and they are palloc'd all the time. I think that we could simplify things with the patch attached, note that I added an assertion for correctness but I don't really think that it is much necessary. Yeah, I'd drop the assertion. Also, how about changing things around slightly so that we lose the goto-label construct? There's only one goto, and its only about 6 lines before the label, so we could just flip the sense of the if-test and put the code that gets skipped inside the if-block. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On Tue, Feb 3, 2015 at 3:51 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 2, 2015 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 2, 2015 at 3:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: So I'm bemused by Robert's insistence that he wants that format to support searches. As I said, I find it far more convenient to search the output of git log and/or src/tools/git_changelog --- I keep text files of those around for exactly that purpose. I normally search in one of two ways. Sometimes a grep the sgml; other times, I go to, say, http://www.postgresql.org/docs/devel/static/release-9-4.html and then edit the URL to take me back to 9.3, 9.2, 9.1, etc. It's true that 'git log' is often the place to go searching for stuff, but there are times when it's easier to find out what release introduced a feature by looking at the release notes, and it's certainly more useful if you want to send a link to someone who is not git-aware illustrating the results of your search. Well, maybe I'm the only one who is doing this and it's not worth worrying about it just for me. But I do it, all the same. I'm not out to take away a feature you need. I'm just wondering why it has to be supported in exactly the way it's done now. Wouldn't a separately maintained release-notes-only document serve the purpose fine? Well, I'd like to have all the release notes on the web site as they are now, and I'd like to have all the SGML in the build tree as it is now. If someone wants to create a make docs-fast target that omits older release notes, that's obviously not a problem. What I don't want to do is have information that is easily available now (put the right two digits into the URL bar and you are done) suddenly require more thought to get at (OK, so back through 9.0 is on the web site, and then there's this other place you can go to get the older stuff, if you can remember where it is). Not sure how feasible that it through build targets, but perhaps we could also include it in the HTML docs for the website per above, but skip it in the PDFs, which would make them a *lot* easier to deal with I bet. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On 2015-02-03 10:20:03 -0500, Tom Lane wrote: Sawada Masahiko sawada.m...@gmail.com writes: On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: The way that FORCE was added to REINDEX was poorly thought out; let's not double down on that with another option added without any consideration for future expansion. I'd be happier if we adopted something similar to the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in parentheses. I understood. I'm imagining new REINDEX syntax are followings. - REINDEX (INDEX, VERBOSE) hoge_idx; - REINDEX (TABLE) hoge_table; i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] option := FORCE | VERBOSE We'd still keep the historical syntax where you can write FORCE outside parens, but it'd be deprecated. Why would we allow force inside the parens, given it's a backward compat only thing afaik? Don't get me wrong, I'm not at all against a extensible syntax, I just don't see a point in further cargo culting FORCE. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Tue, Feb 3, 2015 at 7:31 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/02/2015 03:36 PM, Robert Haas wrote: Second, I*think* that these settings are symmetric and, if that's right, then I suggest that they ought to be named symmetrically. Basically, I think you've got min_checkpoint_segments (the number of recycled segments we keep around always) and max_checkpoint_segments (the maximum number of segments we can have between checkpoints), essentially splitting the current role of checkpoint_segments in half. I'd go so far as to suggest we use exactly that naming. It would be reasonable to allow the value to be specified in MB rather than in 16MB units, and to specify it that way by default, but maybe a unit-less value should have the old interpretation since everybody's used to it. That would require adding GUC_UNIT_XSEG or similar, but that seems OK. Works for me. However, note that max_checkpoint_segments = 10 doesn't mean the same as current checkpoint_segments = 10. With checkpoint_segments = 10 you end up with about 2x-3x as much WAL as with max_checkpoint_segments = 10. So the everybody's used to it argument doesn't hold much water. Hmm, that's surprising. Why does that happen? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables
Hmm 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes ... which results in 8192 1GB segments :O 8192 1GB segments is just 8TB, its not _that_ large. At TripAdvisor we’ve been using a NoSQL solution to do session storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres). Essentially, what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing session attributes which is partially normalized, partially json. 315 million uniques a month multiplied by the retention policy means I need to hold 2-4 billion session objects (and somehow expire old ones). Additionally, most http calls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’. Obviously, I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated that a single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. Well, it can for 38 hours… until you wrap xid’s on the toast table. :P I’ll be the first to admit that isn’t the normal use case though. I’m happy to have found this thread, however, because I’m going to have to build around the global oid counter, explicitly the prevent the problem I explain below regarding clustering. Anybody actually reaching this limit out there? Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with pg_restore/clustering that is actually pretty concerning. Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is 17,000,000, so that is still a couple of orders of magnitude too small. (however, close enough that it’ll be a concern in a few years). However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids. I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem. What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same cluster (i.e. creating large amounts of temp tables) The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the implementation slower for most though this might be actually not that important. Well, you aren’t doubling the storage. Even if you have to store the key in 4 places, you are adding 16 bytes per TOAST tuple. If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%. I’m sure there are more hidden costs but we are really only talking about a low single digit percent increase. In exchange, you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index. That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs. - Matt K -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing markup in pg_receivexlog.sgml
On 02/03/2015 10:27 AM, Michael Paquier wrote: On Tue, Feb 3, 2015 at 4:25 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Flush the WAL data to disk immediately after it's being received. Also send a status packet back to the server immediately after flushing, regardless of literal--status-interval/ Yes, that's indeed better. As long as I am on it, attached is a patch for that... Ok, thanks :-). Applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance difference between kernel 3.10 and 3.12
I see significant performance difference between systems using Linux kernel 3.10 and 3.12. We did tests on CentOS7 (kernel 3.10) and SuSE Enterprise Linux Server 12 (kernel 3.12). Both systems run on VMWare ESXi 5.5.0. Xeon CPU E52650 2GHz x 2 (16 cores in total) Mem 96GB, 10,000 RPM SAS drive Guest OS setting: CPU 4 Mem 6GB file system ext4 PostgreSQL 9.3.5 compiled from source code (all parameters are default) The benchmark tables are created by pgbench -i -s 500. pgbench -S -c NUM -T 500 runs 3 times for each NUM concurrent users and we calculate the average TPS for the each run. We ran pgbench in 4 patters: 1) SLES12 (kernel 3.12) 2) CentOS7 (kernel 3.10 with default deadline scheduler) 3) CentOS7 (kernel 3.10 with cfq scheduler) 4) CentOS7 (kernel replaced with 3.12 with cfq scheduler) The result is roughly: #1 #4 #2 #3 So it seems the performance difference is dominated by the kernel version difference (3.10 vs. 3.12). Does anybody know the cause? Note that kernel parameter sched_autogroup (which is available only with 3.10 kernel) did not give much difference. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing markup in pg_receivexlog.sgml
On Tue, Feb 3, 2015 at 4:25 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Flush the WAL data to disk immediately after it's being received. Also send a status packet back to the server immediately after flushing, regardless of literal--status-interval/ Yes, that's indeed better. As long as I am on it, attached is a patch for that... -- Michael diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index be321b5..0c99744 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -139,9 +139,9 @@ PostgreSQL documentation termoption--synchronous/option/term listitem para -Issue sync commands as soon as there is WAL data which has not been -flushed yet. Also status packets are sent back to the server just after -WAL data is flushed whatever literal--status-interval/ is set to. +Flush the WAL data to disk immediately after it has been received. Also +send a status packet back to the server immediately after flushing, +regardless of literal--status-interval/. /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small memory leak in execute.c of ECPG driver
On 02/03/2015 08:58 AM, Michael Paquier wrote: Hi all, In exactly 3 places of the ECPG driver (for numeric, for interval and for date), we do something as follows: /* Allocation of mallocedval */ if (!(mallocedval = ecpg_strdup(array [, lineno))) return false; for (element = 0; element var-arrsize; element++) { int result; ptr = stuff_alloc(); if (!ptr) return false; = Leak here of mallocedval It happens that if the allocation done within this for loop fails we leak mallocedval that was previously allocated. Attached is a patch to fix this issue spotted by Coverity. I think there are more similar leaks nearby. After the first hunk, there's another if-check with return false that also leaks mallocedval. Right after the two other hunks, if the ecpg_realloc fails, we again leak mallocedval. I wonder why Coverity didn't warn about those? Maybe it would've, after fixing the first ones. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fetch zero result rows when executing a query?
Sorry if this has been asked before, couldn't find any mention... I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to find a way to execute a query but without fetching any rows. The Execute message has a maximum result-row count, but zero is documented to mean fetch all rows. The use case would be sending a query which might modify or might not (e.g. UPDATE), but we know that the user is uninterested in any result row. My current workaround is to specify maxrows=1, was wondering if I missed a better alternative. Thanks, Shay
Re: [HACKERS] Fetch zero result rows when executing a query?
Hi, On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote: Sorry if this has been asked before, couldn't find any mention... I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to find a way to execute a query but without fetching any rows. The Execute message has a maximum result-row count, but zero is documented to mean fetch all rows. The use case would be sending a query which might modify or might not (e.g. UPDATE), but we know that the user is uninterested in any result row. My current workaround is to specify maxrows=1, was wondering if I missed a better alternative. Is this really a relevant optimization? If the user doesn't want results, RETURNING shouldn't be specified... Sure, sometimes the same query will be reused over cases where you want the results and those where you don't, but I doubt this is worthy of optimization. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On 2/3/15 9:20 AM, Tom Lane wrote: i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] VACUUM puts the options before the table name, so ISTM it'd be best to keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or REINDEX {INDEX | ...} (options). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 2/3/15 9:50 AM, David Steele wrote: On 2/3/15 10:01 AM, José Luis Tallón wrote: Hmmm alter column set storage external / set storage extended ? From http://www.postgresql.org/docs/9.4/static/sql-altertable.html : ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } This would do what you described, right? EXTENDED is the default for most TOAST-able types and is still subject to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but with no compression. See: http://www.postgresql.org/docs/9.4/static/storage-toast.html Right. I'd like to be able to set per-column TOAST_TUPLE_THRESHOLD. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables
On 2/3/15 9:01 AM, Tom Lane wrote: Matthew Kelly mke...@tripadvisor.com writes: However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids. I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem. That may be a hazard, but ... That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs. ... this fix would actually make things enormously worse. With the single counter feeding all tables, you at least have a reasonable probability that there are not enormously long runs of consecutive OIDs in any one toast table. With a sequence per table, you are nearly guaranteed that there are such runs, because inserts into other tables don't create a break. (This effect is also why you're wrong to claim that partitioning can't fix it.) That's assuming that toasting is evenly spread between tables. In my experience, that's not a great bet... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add LINE: hint when schemaname.typename is a non-existent schema
On Mon, Feb 2, 2015 at 2:44 PM, Ryan Kelly rpkell...@gmail.com wrote: The attached patch adds a LINE: ... hint when schemaname.typename results in a schema which does not exist. I came across this when a missing comma in a SELECT list resulted in an error without a location in a query a few thousand lines long. Before: (postgres@[local]:5432 14:41:25) [postgres] select test.id 'all' as example from test; ERROR: 3F000: schema test does not exist LOCATION: get_namespace_oid, namespace.c:2826 After: (postgres@[local]:5433 14:42:32) [postgres] select test.id 'all' as example from test; ERROR: 3F000: schema test does not exist LINE 1: select test.id 'all' as example from test; ^ LOCATION: LookupTypeName, parse_type.c:171 -Ryan Kelly Please add your patch to https://commitfest.postgresql.org/action/commitfest_view/open so do we don't forget about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On 2015-02-03 10:41:04 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. I always thought the reason for defaulting to minimal was performance. I'd like to see proof that the impact of wal_level = hot_standby is negligible before we consider doing this. Well, it really depends on what you're doing. The cases where minimal is beneficial is when you COPY into a table that's been created in the same (sub)xact or rewrite it.. Other than that there's not really a difference? The argument that having to change one more GUC is an undue burden while configuring hot standby seems ridiculous from here. HS is not nearly push the EASY button and you're done, and this change wouldn't make it so. But pg_basebackup is pretty close to being that easy, isn't it? There's still pg_hba.conf to deal with, but other than that... And with a littlebit more work (safely autocreate replication slots, so wal files aren't getting removed prematurely), we can make amke HS simpler as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Andres Freund and...@2ndquadrant.com writes: On 2015-02-03 10:20:03 -0500, Tom Lane wrote: Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] option := FORCE | VERBOSE We'd still keep the historical syntax where you can write FORCE outside parens, but it'd be deprecated. Why would we allow force inside the parens, given it's a backward compat only thing afaik? Don't get me wrong, I'm not at all against a extensible syntax, I just don't see a point in further cargo culting FORCE. Ah, I'd forgotten that that option was now a no-op. Yeah, there's no reason to support it in the new syntax. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On 2/3/15 10:11 AM, Marko Tiikkaja wrote: And now that we're on the subject of ponies, it would be nice if the relevant git hashes were included as well. That's probably not going to happen. A release-note entry is often the combination of many commits, and accurately tracking those is a lot of work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On Tue, Feb 3, 2015 at 7:43 AM, Andres Freund and...@2ndquadrant.com wrote: I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. Previously there have been arguments against changing the default of wal_level because it'd mean the regression tests wouldn't exercise minimal anymore. That might be true, but then right now we just don't exercise the more complex levels. If we're really concerned we can just force a different value during the tests, just as we do for prepared xacts. Comments? Additionally, more complex and further into the future, I wonder if we couldn't also get rid of wal_level = logical by automatically switching to it whenever logical slots are active. I think my vote is to maintain the status quo. What you're basically proposing to do is ship the system half-configured for replication, and I don't see the point of that. The people who want replication still have to do the rest of the setup anyway, and the people who don't want replication are losing the benefits of wal_level=minimal for no real gain. In particular, they are losing the ability to skip WAL-logging when bulk-loading a just-created table, which is not a small thing. I'm fairly sure we have customers who benefit significantly from that behavior. I agree that wal_level=archive doesn't serve much purpose at this point. I guess I wouldn't object to removing that, although I can't see much benefit to doing so, either. Crazy ideas: Could we make wal_level something other than PGC_POSTMASTER? PGC_SIGHUP would be nice... Could we, maybe, even make it a derived value rather than one that is explicitly configured? Like, if you set max_wal_senders0, you automatically get wal_level=hot_standby? If you register a logical replication slot, you automatically get wal_level=logical? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of LSEG.m
On 02/03/2015 06:47 PM, Tom Lane wrote: Or perhaps we should just remove both the field and the ifdef'd assignments. That's a bit more drastic but I can't really see this code ever coming back to life ... especially since the notion of a field that's not stored on disk but is valid in in-memory copies seems impossibly error-prone. Most functions can have no idea whether their input is residing in a disk buffer or not. And adding the bookkeeping to determine that would surely cost more than just recomputing the slope when needed. +1 for removing it altogether. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On 2/3/15 12:32 AM, Tom Lane wrote: It would not make that much of a difference in tarball size, agreed. It *would* make a difference in the build time and output size of the SGML docs --- as I mentioned at the outset, the release notes currently account for 25% of the SGML source linecount. We could make the release notes a separate top-level document. Then it could be built in parallel with the documentation. There are only two links from the rest of the docs into the release notes, so there wouldn't be much harm. Doing this would also solve an unrelated issue: Sometimes, when I run a spell or markup checker over the documentation, the old release notes tend to get flagged for many things, because we don't edit those after the release. Separating the actively maintained documentation from the it-is-what-it-is release notes would make that distinction clearer. The web site would need to be reorganized slightly, but I can imagine that in the long run this could also be a win. Note also that you only need to present the release notes from the latest stable release branch on the web site, as opposed to documentation for each branch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add LINE: hint when schemaname.typename is a non-existent schema
Ryan Kelly rpkell...@gmail.com writes: The attached patch adds a LINE: ... hint when schemaname.typename results in a schema which does not exist. I came across this when a missing comma in a SELECT list resulted in an error without a location in a query a few thousand lines long. I think this is a good problem to attack, but the proposed patch seems a bit narrow and brute-force. Surely this isn't the only place where we're missing an errposition pointer on a no such schema error? It's probably not reasonable to add a pstate argument to LookupExplicitNamespace itself, given that namespace.c isn't part of the parser. But you could imagine adding an interface function in the parser that calls LookupExplicitNamespace and then throws a position-aware error on failure, and then making sure that all schema lookups in the parser go through that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Getting rid of LSEG.m
I noticed a Coverity complaint about the m field of LSEG being uninitialized, which indeed it is because all the places that would set it are ifdef'd out. So why didn't the field itself get ifdef'd out as well? Or perhaps we should just remove both the field and the ifdef'd assignments. That's a bit more drastic but I can't really see this code ever coming back to life ... especially since the notion of a field that's not stored on disk but is valid in in-memory copies seems impossibly error-prone. Most functions can have no idea whether their input is residing in a disk buffer or not. And adding the bookkeeping to determine that would surely cost more than just recomputing the slope when needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
Peter Eisentraut pete...@gmx.net writes: Note also that you only need to present the release notes from the latest stable release branch on the web site, as opposed to documentation for each branch. Yeah, JD suggested the same upthread. If we went over to a separate document containing all the historical notes, then it would make sense for the main documentation to contain only release notes for the current branch, which would further reduce its build time. My thread-starting proposal of keeping the last five branches was based on the assumption that we didn't need any whole-history document, but if we're keeping one separately then this seems to make the most sense. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 2/3/15 10:01 AM, José Luis Tallón wrote: Hmmm alter column set storage external / set storage extended ? From http://www.postgresql.org/docs/9.4/static/sql-altertable.html : ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } This would do what you described, right? EXTENDED is the default for most TOAST-able types and is still subject to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but with no compression. See: http://www.postgresql.org/docs/9.4/static/storage-toast.html -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Redesigning checkpoint_segments
On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Works for me. However, note that max_checkpoint_segments = 10 doesn't mean the same as current checkpoint_segments = 10. With checkpoint_segments = 10 you end up with about 2x-3x as much WAL as with max_checkpoint_segments = 10. So the everybody's used to it argument doesn't hold much water. Hmm, that's surprising. Why does that happen? That's the whole point of this patch. max_checkpoint_segments = 10, or max_checkpoint_segments = 160 MB, means that the system will begin a checkpoint so that when the checkpoint completes, and it truncates away or recycles old WAL, the total size of pg_xlog is 160 MB. That's different from our current checkpoint_segments setting. With checkpoint_segments, the documented formula for calculating the disk usage is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's a lot less intuitive to set. Hmm, that's different from what I was thinking. We probably shouldn't call that max_checkpoint_segments, then. I got confused and thought you were just trying to decouple the number of segments that it takes to trigger a checkpoint from the number we keep preallocated. But I'm confused: how can we know how much new WAL will be written before the checkpoint completes? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
Peter Eisentraut pete...@gmx.net writes: On 2/3/15 10:11 AM, Marko Tiikkaja wrote: And now that we're on the subject of ponies, it would be nice if the relevant git hashes were included as well. That's probably not going to happen. A release-note entry is often the combination of many commits, and accurately tracking those is a lot of work. Well, actually, modern release note entries tend to look like !-- Author: Andres Freund and...@anarazel.de Branch: master [3fabed070] 2015-01-07 00:19:37 +0100 Branch: REL9_4_STABLE [7da102154] 2015-01-07 00:24:58 +0100 Author: Andres Freund and...@anarazel.de Branch: master [31912d01d] 2015-01-07 00:18:00 +0100 Branch: REL9_4_STABLE [84911ff51] 2015-01-07 00:24:47 +0100 -- listitem para Assorted fixes for logical decoding (Andres Freund) /para /listitem because they're prepared on the basis of src/tools/git_changelog output and so having the git hashes is merely a matter of not deleting that info after we're done writing the user-visible text. So I could imagine some tool that presents that info. I'm not volunteering to write it though ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 03/02/15 16:50, Robert Haas wrote: On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: That's the whole point of this patch. max_checkpoint_segments = 10, or max_checkpoint_segments = 160 MB, means that the system will begin a checkpoint so that when the checkpoint completes, and it truncates away or recycles old WAL, the total size of pg_xlog is 160 MB. That's different from our current checkpoint_segments setting. With checkpoint_segments, the documented formula for calculating the disk usage is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's a lot less intuitive to set. Hmm, that's different from what I was thinking. We probably shouldn't call that max_checkpoint_segments, then. I got confused and thought you were just trying to decouple the number of segments that it takes to trigger a checkpoint from the number we keep preallocated. But I'm confused: how can we know how much new WAL will be written before the checkpoint completes? The preallocation is based on estimated size of next checkpoint which is basically running average of the previous checkpoints with some additional adjustments for unsteady behavior (last checkpoint has higher weight in the formula). (we also still internally have the CheckPointSegments which is calculated the way Heikki described above) In any case, I don't like the max_checkpoint_segments naming too much, and I don't even like the number of segments as limit too much, I think the ability to set this in actual size is quite nice property of this patch and as Heikki says the numbers don't map that well to the old ones in practice. I did some code reading and I do like the patch. Basically only negative thing I can say is that I am not big fan of _logSegNo variable name but that's not new in this patch, we use it all over the place in xlog. I would vote for bigger default of the checkpoint_wal_size (or whatever it will be named) though, since the current one is not much bigger in practice than what we have now and that one is way too conservative. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/03/2015 05:19 PM, Robert Haas wrote: On Tue, Feb 3, 2015 at 7:31 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/02/2015 03:36 PM, Robert Haas wrote: Second, I*think* that these settings are symmetric and, if that's right, then I suggest that they ought to be named symmetrically. Basically, I think you've got min_checkpoint_segments (the number of recycled segments we keep around always) and max_checkpoint_segments (the maximum number of segments we can have between checkpoints), essentially splitting the current role of checkpoint_segments in half. I'd go so far as to suggest we use exactly that naming. It would be reasonable to allow the value to be specified in MB rather than in 16MB units, and to specify it that way by default, but maybe a unit-less value should have the old interpretation since everybody's used to it. That would require adding GUC_UNIT_XSEG or similar, but that seems OK. Works for me. However, note that max_checkpoint_segments = 10 doesn't mean the same as current checkpoint_segments = 10. With checkpoint_segments = 10 you end up with about 2x-3x as much WAL as with max_checkpoint_segments = 10. So the everybody's used to it argument doesn't hold much water. Hmm, that's surprising. Why does that happen? That's the whole point of this patch. max_checkpoint_segments = 10, or max_checkpoint_segments = 160 MB, means that the system will begin a checkpoint so that when the checkpoint completes, and it truncates away or recycles old WAL, the total size of pg_xlog is 160 MB. That's different from our current checkpoint_segments setting. With checkpoint_segments, the documented formula for calculating the disk usage is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's a lot less intuitive to set. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL information view
On Tue, Feb 3, 2015 at 6:42 AM, Michael Paquier michael.paqu...@gmail.com wrote: Where are we on this patch? No new version has been provided and there have been comments provided by Heikki here (5491e547.4040...@vmware.com) and by Alexei here (87ppbqz00h@commandprompt.com). Yeah, it's on my shame list. I'm definitely planning to get a new version in before the next CF and to try to work quicker with it that time to finish it off on time. Thanks for the reminder! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] PageRepairFragmentation performance
On 01/31/2015 01:49 AM, Peter Geoghegan wrote: The refactoring patch certainly looks very reasonable. Ok, committed the refactoring part for now. Thanks for the review. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small memory leak in execute.c of ECPG driver
On Tue, Feb 3, 2015 at 5:35 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think there are more similar leaks nearby. After the first hunk, there's another if-check with return false that also leaks mallocedval. Right after the two other hunks, if the ecpg_realloc fails, we again leak mallocedval. Yes, I found some extra ones by re-reading the code again with newcopy (2) as well as mallocedval (1) as you mentioned. I wonder why Coverity didn't warn about those? Maybe it would've, after fixing the first ones. Hard to say. Perhaps it gives up after finding one failure in a code path, or perhaps it would have found it after a version update.. In any case, an updated patch is attached. -- Michael From 4e195f162d879bf563fec052710dd10ccdc4a89a Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Tue, 3 Feb 2015 15:48:16 +0900 Subject: [PATCH] Fix memory leak in ecpg driver Issue pointed out by Coverity. --- src/interfaces/ecpg/ecpglib/execute.c | 16 1 file changed, 16 insertions(+) diff --git a/src/interfaces/ecpg/ecpglib/execute.c b/src/interfaces/ecpg/ecpglib/execute.c index 8a3dd75..5d26af7 100644 --- a/src/interfaces/ecpg/ecpglib/execute.c +++ b/src/interfaces/ecpg/ecpglib/execute.c @@ -803,7 +803,10 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari mallocedval = quote_postgres(newcopy, quote, lineno); if (!mallocedval) + { + ecpg_free(newcopy); return false; + } *tobeinserted_p = mallocedval; } @@ -835,7 +838,10 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari mallocedval = quote_postgres(newcopy, quote, lineno); if (!mallocedval) + { + ecpg_free(newcopy); return false; + } *tobeinserted_p = mallocedval; } @@ -859,7 +865,10 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari nval = PGTYPESnumeric_new(); if (!nval) + { +ecpg_free(mallocedval); return false; + } if (var-type == ECPGt_numeric) result = PGTYPESnumeric_copy((numeric *) ((var + var-offset * element)-value), nval); @@ -869,6 +878,7 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari if (result != 0) { PGTYPESnumeric_free(nval); +ecpg_free(mallocedval); return false; } @@ -940,7 +950,10 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari { str = quote_postgres(PGTYPESinterval_to_asc((interval *) ((var + var-offset * element)-value)), quote, lineno); if (!str) + { +ecpg_free(mallocedval); return false; + } slen = strlen(str); if (!(mallocedval = ecpg_realloc(mallocedval, strlen(mallocedval) + slen + 2, lineno))) @@ -991,7 +1004,10 @@ ecpg_store_input(const int lineno, const bool force_indicator, const struct vari { str = quote_postgres(PGTYPESdate_to_asc(*(date *) ((var + var-offset * element)-value)), quote, lineno); if (!str) + { +ecpg_free(mallocedval); return false; + } slen = strlen(str); if (!(mallocedval = ecpg_realloc(mallocedval, strlen(mallocedval) + slen + 2, lineno))) -- 2.2.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On 2015-02-03 13:51:25 +0100, Magnus Hagander wrote: Those who want to optimize their WAL size can set it back to minimal, but let's make the default the one that makes life *easy* for people. Precisely. New users won't usually have tremendous stuff to load in the specific circumstances in which minimal makes a differences. And pretty much everyone uses base backups replication outside of initial loading anyway. The other option, which would be more complicated (I have a semi-finished patch that I never got time to clean up) would be for pg_basebackup to be able to dynamically raise the value of wal_level during it's run. It would not help with the streaming standby part, but it would help with pg_basebackup. That could be useful independent - for those who prefer using wal_level=minimal and also pg_basebackup.. There's some ugly corner cases in making that happen. I'm sure we could do it, but I'm really not convinced it's worth the effort. Previously there have been arguments against changing the default of wal_level because it'd mean the regression tests wouldn't exercise minimal anymore. That might be true, but then right now we just don't exercise the more complex levels. If we're really concerned we can just force a different value during the tests, just as we do for prepared xacts. Seems we should focus our tests on the stuff that people actually use in reality? :) And if we change the default, then even more people will use that level. Agreed. It's not my argument ;) But it would definitely be a good idea to have some buildfarm animals set up to test each one. .oO(make it random in the buildfarm code) Additionally, more complex and further into the future, I wonder if we couldn't also get rid of wal_level = logical by automatically switching to it whenever logical slots are active. If it can be safely done online, I definitely think that would be a good goal to have. I think it could be at least made work on the primary. I don't really see how on a standby (which we don't support yet anyway). If we could do the same for hot_standby if you had physical slots, that might also be a good idea? I think it's slightly more complicated there. We'd have to delay slot creation till after a checkpoint and such, which doesn't seem that desirable. I think it's more interesting for logical anyway - there is some common workloads where it actually does imply some (not large, mind you) overhead. So we can't just change to it as a default. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On Tue, Feb 3, 2015 at 9:43 PM, Andres Freund and...@2ndquadrant.com wrote: I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). +1. So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. The last time I mentioned something similar (purely removing archive from wal_level CA+TgmoaTG9U4=a_bs8sbdemm2+fapqhzujhj7f-npfy+bns...@mail.gmail.com), there were two additional suggestions done as well: - Keep archive and make it mean archive = hot_standby - Do nothing to still let the users what they think is better and not what we think is better. Perhaps times have changed since... I guess that you mean making both values become equivalent, right? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On 03/02/15 13:51, Magnus Hagander wrote: On Tue, Feb 3, 2015 at 1:43 PM, Andres Freund and...@2ndquadrant.com mailto:and...@2ndquadrant.com wrote: Hi, I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. +1. +1 too Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. Yes, please! Those who want to optimize their WAL size can set it back to minimal, but let's make the default the one that makes life *easy* for people. The other option, which would be more complicated (I have a semi-finished patch that I never got time to clean up) would be for pg_basebackup to be able to dynamically raise the value of wal_level during it's run. It would not help with the streaming standby part, but it would help with pg_basebackup. That could be useful independent - for those who prefer using wal_level=minimal and also pg_basebackup.. This is not that easy to do, let's do it one step at a time. Comments? Additionally, more complex and further into the future, I wonder if we couldn't also get rid of wal_level = logical by automatically switching to it whenever logical slots are active. If it can be safely done online, I definitely think that would be a good goal to have. If we could do the same for hot_standby if you had physical slots, that might also be a good idea? +many for the logical, physical would be nice but I think it's again in the category of not so easy and maybe better as next step if at all. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On 2015-02-03 21:58:44 +0900, Michael Paquier wrote: On Tue, Feb 3, 2015 at 9:43 PM, Andres Freund and...@2ndquadrant.com wrote: I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). +1. So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. The last time I mentioned something similar (purely removing archive from wal_level CA+TgmoaTG9U4=a_bs8sbdemm2+fapqhzujhj7f-npfy+bns...@mail.gmail.com), there were two additional suggestions done as well: - Keep archive and make it mean archive = hot_standby That's actually what I was thinking of implementing. I.e. accept archive, but output hot_standby. - Do nothing to still let the users what they think is better and not what we think is better. I'd rather remove the supporting code that takes different branches based on those. Perhaps times have changed since... I don't think the arguments back then made all that much sense. But perhaps they're also less important if we change the default. For me removing an option that doesn't have an effect but saving a couple bytes every now and then and adding to the testing matrix isn't 'nannyism'. It's removing pointless choice. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlikely-to-happen crash in ecpg driver caused by NULL-pointer check not done
On Tue, Feb 3, 2015 at 7:50 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. Since the ecpg_add_mem call is done after setting (*(void **) var), that's left to point to already-free'd memory. The other call sites have a similar issue. I haven't analyzed the code to check if that's harmless or not, but seems better to not do that. Right, it is an error do allocate this memory if there is a risk that it may be freed. Hence fixed. In ecpg_add_mem, the ecpg_raise() call is unnecessary, since ecpg_alloc already does that on failure. Right, check. (It would be less error-prone to have an ecpg_alloc_auto() function that combines ecpg_alloc+ecpg_add_mem in one call.) It makes sense. /* Here are some methods used by the lib. */ /* Returns a pointer to a string containing a simple type name. */ That second comment is completely bogus. It's not this patch's fault, it's been like that forever, but just happened to notice.. Corrected. All those things are addressed in the patch attached. -- Michael From 206dc0dd95a83e056968ac822f7b9331ffcf82cd Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Tue, 3 Feb 2015 16:21:50 +0900 Subject: [PATCH] Fix unlikely-to-happen crash in ecpg_add_mem This routine was called ecpg_alloc to allocate to memory but did not actually check the returned pointer allocated, potentially NULL which is actually the result of a malloc call. Issue noted by Coverity, though I guessed the legwork needed here. --- src/interfaces/ecpg/ecpglib/descriptor.c | 6 ++ src/interfaces/ecpg/ecpglib/execute.c| 8 +++- src/interfaces/ecpg/ecpglib/extern.h | 4 ++-- src/interfaces/ecpg/ecpglib/memory.c | 22 +- 4 files changed, 28 insertions(+), 12 deletions(-) diff --git a/src/interfaces/ecpg/ecpglib/descriptor.c b/src/interfaces/ecpg/ecpglib/descriptor.c index b2990ca..956c035 100644 --- a/src/interfaces/ecpg/ecpglib/descriptor.c +++ b/src/interfaces/ecpg/ecpglib/descriptor.c @@ -432,7 +432,7 @@ ECPGget_desc(int lineno, const char *desc_name, int index,...) /* allocate storage if needed */ if (arrsize == 0 *(void **) var == NULL) { - void *mem = (void *) ecpg_alloc(offset * ntuples, lineno); + void *mem = (void *) ecpg_auto_alloc(offset * ntuples, lineno); if (!mem) { @@ -440,7 +440,6 @@ ECPGget_desc(int lineno, const char *desc_name, int index,...) return false; } *(void **) var = mem; - ecpg_add_mem(mem, lineno); var = mem; } @@ -510,7 +509,7 @@ ECPGget_desc(int lineno, const char *desc_name, int index,...) /* allocate storage if needed */ if (data_var.ind_arrsize == 0 data_var.ind_value == NULL) { - void *mem = (void *) ecpg_alloc(data_var.ind_offset * ntuples, lineno); + void *mem = (void *) ecpg_auto_alloc(data_var.ind_offset * ntuples, lineno); if (!mem) { @@ -518,7 +517,6 @@ ECPGget_desc(int lineno, const char *desc_name, int index,...) return false; } *(void **) data_var.ind_pointer = mem; - ecpg_add_mem(mem, lineno); data_var.ind_value = mem; } diff --git a/src/interfaces/ecpg/ecpglib/execute.c b/src/interfaces/ecpg/ecpglib/execute.c index 8a3dd75..46d884f 100644 --- a/src/interfaces/ecpg/ecpglib/execute.c +++ b/src/interfaces/ecpg/ecpglib/execute.c @@ -398,11 +398,10 @@ ecpg_store_result(const PGresult *results, int act_field, } ecpg_log(ecpg_store_result on line %d: allocating memory for %d tuples\n, stmt-lineno, ntuples); - var-value = (char *) ecpg_alloc(len, stmt-lineno); + var-value = (char *) ecpg_auto_alloc(len, stmt-lineno); if (!var-value) return false; *((char **) var-pointer) = var-value; - ecpg_add_mem(var-value, stmt-lineno); } /* allocate indicator variable if needed */ @@ -410,11 +409,10 @@ ecpg_store_result(const PGresult *results, int act_field, { int len = var-ind_offset * ntuples; - var-ind_value = (char *) ecpg_alloc(len, stmt-lineno); - if (!var-ind_value) + var-ind_value = (char *) ecpg_auto_alloc(len, stmt-lineno); + if (!var_ind_value) return false; *((char **) var-ind_pointer) = var-ind_value; - ecpg_add_mem(var-ind_value, stmt-lineno); } /* fill the variable with the tuple(s) */ diff --git a/src/interfaces/ecpg/ecpglib/extern.h b/src/interfaces/ecpg/ecpglib/extern.h index 3836007..2b670e0 100644 --- a/src/interfaces/ecpg/ecpglib/extern.h +++ b/src/interfaces/ecpg/ecpglib/extern.h @@ -136,8 +136,7 @@ extern struct var_list *ivlist; /* Here are some methods used by the lib. */ -/* Returns a pointer to a string containing a simple type name. */ -void ecpg_add_mem(void *ptr, int lineno); +bool ecpg_add_mem(void *ptr, int lineno); bool ecpg_get_data(const PGresult *, int, int, int, enum ECPGttype type, enum ECPGttype, char *, char *, long, long, long, @@ -148,6 +147,7 @@ void ecpg_pthreads_init(void); #endif struct connection *ecpg_get_connection(const char *); char
Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders
On Tue, Feb 3, 2015 at 1:43 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, I think these days there's no reason for the split between the archive and hot_standby wal levels. The split was made out of volume and stability concerns. I think we can by now be confident about the wal_level = hot_standby changes (note I'm not proposing hot_standby = on). So let's remove the split. It just gives users choice between two options that don't have a meaningful difference. +1. Additionally I think we should change the default for wal_level to hot_standby and max_wal_senders (maybe to 5). That way users can use pg_basebackup and setup streaming standbys without having to restart the primary. I think that'd be a important step in making setup easier. Yes, please! Those who want to optimize their WAL size can set it back to minimal, but let's make the default the one that makes life *easy* for people. The other option, which would be more complicated (I have a semi-finished patch that I never got time to clean up) would be for pg_basebackup to be able to dynamically raise the value of wal_level during it's run. It would not help with the streaming standby part, but it would help with pg_basebackup. That could be useful independent - for those who prefer using wal_level=minimal and also pg_basebackup.. Previously there have been arguments against changing the default of wal_level because it'd mean the regression tests wouldn't exercise minimal anymore. That might be true, but then right now we just don't exercise the more complex levels. If we're really concerned we can just force a different value during the tests, just as we do for prepared xacts. Seems we should focus our tests on the stuff that people actually use in reality? :) And if we change the default, then even more people will use that level. But it would definitely be a good idea to have some buildfarm animals set up to test each one. Comments? Additionally, more complex and further into the future, I wonder if we couldn't also get rid of wal_level = logical by automatically switching to it whenever logical slots are active. If it can be safely done online, I definitely think that would be a good goal to have. If we could do the same for hot_standby if you had physical slots, that might also be a good idea? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Release note bloat is getting out of hand
Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 02/02/2015 05:39 PM, Peter Eisentraut wrote: I share the sentiment that the release notes *seem* too big, but the subsequent discussion shows that it's not clear why that's really a problem. Exactly what problem are we trying to fix? At a rough count of lines, the release notes for unsupported versions are about 18% of documentation overall (47K out of 265K lines). So they're not insubstantial. Compared to the total size of the tarball, though ... It would not make that much of a difference in tarball size, agreed. It *would* make a difference in the build time and output size of the SGML docs --- as I mentioned at the outset, the release notes currently account for 25% of the SGML source linecount. I run `make -s -j4 world` on my i7 fairly often, and it is often the doc build that I wind up waiting for at the end. FWIW, my preference would be that unless you choose a special all release notes build, it only build release notes for supported versions and only up to the point that the branch being built split off. That way, you don't have to work at it to see whether the release notes from an older branch duplicate the same bug fixes as a later branch that is listed. I think it makes sense to put the all release notes page up on our web site. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note bloat is getting out of hand
On 02/03/2015 08:55 AM, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 02/02/2015 05:39 PM, Peter Eisentraut wrote: I share the sentiment that the release notes *seem* too big, but the subsequent discussion shows that it's not clear why that's really a problem. Exactly what problem are we trying to fix? At a rough count of lines, the release notes for unsupported versions are about 18% of documentation overall (47K out of 265K lines). So they're not insubstantial. Compared to the total size of the tarball, though ... It would not make that much of a difference in tarball size, agreed. It *would* make a difference in the build time and output size of the SGML docs --- as I mentioned at the outset, the release notes currently account for 25% of the SGML source linecount. I run `make -s -j4 world` on my i7 fairly often, and it is often the doc build that I wind up waiting for at the end. I realize this is slightly OT, but I wonder if it might be worth having targets that build and install everything but the docs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/02/2015 03:36 PM, Robert Haas wrote: Second, I*think* that these settings are symmetric and, if that's right, then I suggest that they ought to be named symmetrically. Basically, I think you've got min_checkpoint_segments (the number of recycled segments we keep around always) and max_checkpoint_segments (the maximum number of segments we can have between checkpoints), essentially splitting the current role of checkpoint_segments in half. I'd go so far as to suggest we use exactly that naming. It would be reasonable to allow the value to be specified in MB rather than in 16MB units, and to specify it that way by default, but maybe a unit-less value should have the old interpretation since everybody's used to it. That would require adding GUC_UNIT_XSEG or similar, but that seems OK. Works for me. However, note that max_checkpoint_segments = 10 doesn't mean the same as current checkpoint_segments = 10. With checkpoint_segments = 10 you end up with about 2x-3x as much WAL as with max_checkpoint_segments = 10. So the everybody's used to it argument doesn't hold much water. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex != ((Oid) 0)), File: nodeModifyTable.c, Line: 1619) Is that just because of the hack in parse_clause.c? Yes. I never built with assertions and so didn't see this, but it doesn't matter. With assertions disabled, count_upsert_exclusion.pl ran successfully to the end. I also tried running VACUUM FREEZE upsert_race_test in a loop in another session at the same time, but it didn't make a difference. How quickly do you see the errors? I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors. Did you build fully optimized, assertion-free code? I've been doing that. I found it necessary to recreate some of the bugs Jeff's tool caught. I also think that I might have needed an 8 core box to see it, but less sure about that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On 01/28/2015 08:05 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Right, that was the idea. I wanted it to include the word OpenSSL, to make it clear in the callers that it's specific to OpenSSL. And SSL, because that's the name of the struct. I agree it looks silly, though. One idea is to have two separate arguments: the implementation name, and the struct name. PQgetSSLstruct(ssl, OpenSSL, SSL) would look less silly. That's probably overkill. Why not establish a convention that the main API struct for the library doesn't have to be named? So it's just PQgetSSLstruct(ssl, OpenSSL), and you only need strange naming if you're dealing with a library that actually has more than one API object that needs to be fetched this way. (That set is likely empty...) Works for me. Committed that way. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Process 'die' interrupts while reading/writing from the client s
On 02/03/2015 11:51 PM, Andres Freund wrote: + * This is called just after low-level writes. That might be after the read + * finished successfully, or it was interrupted via interrupt. 'blocked' tells + * us whether the Looks like you forgot to complete that sentence before pushing... Also, nightjar is unhappy with this: TRAP: FailedAssertion(!(InterruptHoldoffCount == 0 CritSectionCount == 0), File: /pgbuild/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c, Line: 574) I think that assertion in ProcessClientWriteInterrupt() should be simply removed. ProcessClientWriteInterrupt() copes just fine, i.e. does nothing, if we're in a critical section or interrupts are held. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On Tue, Feb 3, 2015 at 8:26 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 2/3/15 9:20 AM, Tom Lane wrote: i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] VACUUM puts the options before the table name, so ISTM it'd be best to keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or REINDEX {INDEX | ...} (options). Makes sense... +1 Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
[HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
When a user is granted USAGE on a foreign server, the psql command \deu+ will show them the username and password bound to the applicable user mapping. To demonstrate (9.3+): (as a superuser) # create extension postgres_fdw ; # create foreign server loopback_server foreign data wrapper postgres_fdw options(host '127.0.0.1', port '5432') ; # create user mapping for public server loopback_server options(username 'foo', password 'bar') ; (as a normal user) \deu+ List of user mappings Server | User name | FDW Options ---+---+ loopback_server | public| (1 row) So far so good? select * from dblink('loopback_server', 'select current_date') as x(column1 date) ; ERROR: permission denied for foreign server loopback_server OK, I can't do that now. Let's fix it: # grant usage on foreign server loopback_server to public ; select * from dblink('loopback_server', 'select current_date') as x(column1 date) ; column1 2015-02-03 (1 row) Sweet! But... \deu+ List of user mappings Server | User name | FDW Options ---+---+ loopback_server | public| (user 'foo', password 'bar') (1 row) Crap. (FWIW, it doesn't matter whether you grant to PUBLIC or to a specific user, the result is the same.) The obvious objection is, well you should just use foreign tables instead of dblink(). I'll cut a long story short by saying that doesn't work for us. We are using postgres_fdw to allow our analysts to run queries against AWS Redshift and blend those results with tables in our OLTP schema. If you know anything about Redshift, or about analysts, you'll realize immediately why foreign tables are not a viable solution. Surely there are many others in a similar position, where the flexibility offered by dblink() makes it preferable to fixed foreign tables. S... what gives? This seems like a really obvious security hole. I've searched the mailing list archives repeatedly and found zero discussion of this issue. -- Noah Yetter Data Architect/DBA @ Craftsy
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Jim Nasby jim.na...@bluetreble.com writes: On 2/3/15 9:20 AM, Tom Lane wrote: Well, the object type is not an optional part of the command. It's *necessary*. I was thinking more like REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ] VACUUM puts the options before the table name, so ISTM it'd be best to keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or REINDEX {INDEX | ...} (options). Well, I really really don't like the first of those. IMO the command name is REINDEX INDEX etc, so sticking something in the middle of that is bogus. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Process 'die' interrupts while reading/writing from the client s
On 2015-02-04 00:47:03 +0200, Heikki Linnakangas wrote: On 02/03/2015 11:51 PM, Andres Freund wrote: + * This is called just after low-level writes. That might be after the read + * finished successfully, or it was interrupted via interrupt. 'blocked' tells + * us whether the Looks like you forgot to complete that sentence before pushing... Also, nightjar is unhappy with this: TRAP: FailedAssertion(!(InterruptHoldoffCount == 0 CritSectionCount == 0), File: /pgbuild/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c, Line: 574) I think that assertion in ProcessClientWriteInterrupt() should be simply removed. ProcessClientWriteInterrupt() copes just fine, i.e. does nothing, if we're in a critical section or interrupts are held. Yea, it really was just a brainfart on my endq. It's perfectly normal to be called that way during error handling... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Overhauling our interrupt handling (was Escaping from blocked send() reprised.)
On 02/03/2015 08:54 PM, Andres Freund wrote: * Previously the patchset didn't handle SIGTERM while InteractiveBackend() was reading from the client. It did handle ctrl-c/d, but since getc() isn't interruptible and can't be replaced with latches... The fix for that isn't super pretty: die(): if (DoingCommandRead whereToSendOutput != DestRemote) ProcessInterrupts(); but imo acceptable for single user mode. That smells an awful lot like ImmediateInterruptOK. Could you use WaitLatchOrSocket to sleep on stdin? Probably needs to be renamed or copied to WaitLatchOrStdin(), or a new flag to be added to wait on stdin instead of a socket, at least on Windows, but in theory. Might not be worth it if it gets complicated - I can live with the above - but it's a thought. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Overhauling our interrupt handling (was Escaping from blocked send() reprised.)
On 2015-02-03 22:17:22 +0200, Heikki Linnakangas wrote: On 02/03/2015 08:54 PM, Andres Freund wrote: * Previously the patchset didn't handle SIGTERM while InteractiveBackend() was reading from the client. It did handle ctrl-c/d, but since getc() isn't interruptible and can't be replaced with latches... The fix for that isn't super pretty: die(): if (DoingCommandRead whereToSendOutput != DestRemote) ProcessInterrupts(); but imo acceptable for single user mode. That smells an awful lot like ImmediateInterruptOK. It awfully does, yes. But we're only exiting, and only in single user mode. And only during command read. We don't need all the other complicated stuff around sinval, async, deadlock checking, etc. Could you use WaitLatchOrSocket to sleep on stdin? Unfortunately I don't think so. poll() etc only works properly on network handles, pipes etc - but stdin can be a file :(. And I think what exactly happens if it's a file fd isn't super well defined. On linux the file is always marked as ready, which would probably actually work... I think this is better solved in the long run to get rid of single user mode, akin to the patch Tom had a year or two back. Adding more complications for it doesn't seem worth it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlikely-to-happen crash in ecpg driver caused by NULL-pointer check not done
On 02/03/2015 09:28 AM, Michael Paquier wrote: Hi all, In ecpg_add_mem of memory.c, we use ecpg_alloc but there is actually no NULL-pointer check. If an OOM shows up exactly at this point, this is likely to cause a crash. Attached patch adds some extra processing to ecpg_add_mem to check if the allocation fails, and to fail properly if an OOM appears. --- a/src/interfaces/ecpg/ecpglib/descriptor.c +++ b/src/interfaces/ecpg/ecpglib/descriptor.c @@ -440,7 +440,12 @@ ECPGget_desc(int lineno, const char *desc_name, int index,...) return false; } *(void **) var = mem; - ecpg_add_mem(mem, lineno); + if (!ecpg_add_mem(mem, lineno)) + { + ecpg_free(mem); + va_end(args); + return false; + } var = mem; } Hmm. Since the ecpg_add_mem call is done after setting (*(void **) var), that's left to point to already-free'd memory. The other call sites have a similar issue. I haven't analyzed the code to check if that's harmless or not, but seems better to not do that. In ecpg_add_mem, the ecpg_raise() call is unnecessary, since ecpg_alloc already does that on failure. (It would be less error-prone to have an ecpg_alloc_auto() function that combines ecpg_alloc+ecpg_add_mem in one call.) /* Here are some methods used by the lib. */ /* Returns a pointer to a string containing a simple type name. */ boolecpg_add_mem(void *ptr, int lineno); bool ecpg_get_data(const PGresult *, int, int, int, enum ECPGttype type, enum ECPGttype, char *, char *, long, long, long, enum ARRAY_TYPE, enum COMPAT_MODE, bool); That second comment is completely bogus. It's not this patch's fault, it's been like that forever, but just happened to notice.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Sawada Masahiko sawada.m...@gmail.com writes: On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier michael.paqu...@gmail.com wrote: Now, I think that it may be better to provide the keyword VERBOSE before the type of object reindexed as REINDEX [ VERBOSE ] object. Actually, my first WIP version of patch added VERBOSE word at before type of object. I'm feeling difficult about that the position of VERBOSE word in REINDEX statement. The way that FORCE was added to REINDEX was poorly thought out; let's not double down on that with another option added without any consideration for future expansion. I'd be happier if we adopted something similar to the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in parentheses. I understood. I'm imagining new REINDEX syntax are followings. - REINDEX (INDEX, VERBOSE) hoge_idx; - REINDEX (TABLE) hoge_table; i.g., I will add following syntax format, REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] ) name [FORCE]; Thought? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 2/3/15 5:27 PM, Jim Nasby wrote: On 2/3/15 9:50 AM, David Steele wrote: EXTENDED is the default for most TOAST-able types and is still subject to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but with no compression. See: http://www.postgresql.org/docs/9.4/static/storage-toast.html Right. I'd like to be able to set per-column TOAST_TUPLE_THRESHOLD. No argument there. There are some columns that I would prefer to always TOAST because even 2K can be very big for some use cases. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Performance improvement for joins where outer side is unique
Hi, I had a look on this patch. Although I haven't understood whole the stuff and all of the related things, I will comment as possible. Performance: I looked on the performance gain this patch gives. For several on-memory joins, I had gains about 3% for merge join, 5% for hash join, and 10% for nestloop (@CentOS6), for simple 1-level joins with aggregation similar to what you mentioned in previous mail like this. =# SELECT count(*) FROM t1 JOIN t2 USING (id); Of course, the gain would be trivial when returning many tuples, or scans go to disk. I haven't measured the loss by additional computation when the query is regarded as not a single join. Explain representation: I don't like that the 'Unique Join: occupies their own lines in the result of explain, moreover, it doesn't show the meaning clearly. What about the representation like the following? Or, It might not be necessary to appear there. Nested Loop ... Output: - Unique Jion: Yes - Seq Scan on public.t2 (cost = ... - - Seq Scan on public.t1 (cost = + - Seq Scan on public.t1 (unique) (cost = Coding: The style looks OK. Could applied on master. It looks to work as you expected for some cases. Random comments follow. - Looking specialjoin_is_unique_join, you seem to assume that !delay_upper_joins is a sufficient condition for not being unique join. The former simplly indicates that don't commute with upper OJs and the latter seems to indicate that the RHS is unique, Could you tell me what kind of relationship is there between them? - The naming unique join seems a bit obscure for me, but I don't have no better idea:( However, the member name has_unique_join seems to be better to be is_unique_join. - eclassjoin_is_unique_join involves seemingly semi-exhaustive scan on ec members for every element in joinlist. Even if not, it might be thought to be a bit too large for the gain. Could you do the equivelant things by more small code? regards, Jan 2015 00:37:19 +1300, David Rowley dgrowle...@gmail.com wrote in caaphdvod_ucmoupovdpxbnkw50o14x3wwkojmzlxkbbn71z...@mail.gmail.com On 1 January 2015 at 02:47, David Rowley dgrowle...@gmail.com wrote: Hi, I've been hacking a bit at the join code again... This time I've been putting some effort into optimising the case where the inner side of the join is known to be unique. For example, given the tables: create table t1 (id int primary key); create table t2 (id int primary key); And query such as: select * from t1 left outer join t2 on t1.id=t2.id; It is possible to deduce at planning time that for each row in the outer relation, only 0 or 1 rows can exist in the inner relation, (inner being t2) I've been hacking at this unique join idea again and I've now got it working for all join types -- Patch attached. Here's how the performance is looking: postgres=# create table t1 (id int primary key); CREATE TABLE postgres=# create table t2 (id int primary key); CREATE TABLE postgres=# insert into t1 select x.x from generate_series(1,100) x(x); INSERT 0 100 postgres=# insert into t2 select x.x from generate_series(1,100) x(x); INSERT 0 100 postgres=# vacuum analyze; VACUUM postgres=# \q Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id; With Patch on master as of 32bf6ee D:\Postgres\install\binpgbench -f d:\unijoin3.sql -T 60 -n postgres transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 78 latency average: 769.231 ms tps = 1.288260 (including connections establishing) tps = 1.288635 (excluding connections establishing) Master as of 32bf6ee D:\Postgres\install\binpgbench -f d:\unijoin3.sql -T 60 -n postgres transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 70 latency average: 857.143 ms tps = 1.158905 (including connections establishing) tps = 1.159264 (excluding connections establishing) That's a 10% performance increase. I still need to perform more thorough benchmarking with different data types. One weird thing that I noticed before is that in an earlier revision of the patch in the executor's join Initialise node code, I had set the unique_inner to true for semi joins and replaced the SEMI_JOIN check for a unique_join check in the execute node for each join method. With this the performance results barely changed from standard... I've yet to find out why. I don't know even what you did precisely but if you do such a thing, you perhaps should change the name of unique_inner to something representing that it reads up to one row from the inner for one outer row. (Sorry I have no good idea for this..) The patch also has added a property to the EXPLAIN (VERBOSE)
Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
On 2015-02-03 14:18:02 +0900, Michael Paquier wrote: - RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u, - lock-xid, lock-dbOid, lock-relOid); +RecoveryLockList contains entry for lock no longer recorded by lock manager: xid %u, + lock-xid); This patch is making the information provided less verbose, and I think that it is useful to have some information not only about the lock held, but as well about the database and the relation. It's debug4 or impossible stuff that lock.c already warned about - I doubt anybody has ever actually looked at it in a long while, if ever. If we really want to provide something more we can use something like LOCK_PRINT() - but I really doubt it's worth neither the notational, nor the verbosity overhead. Also, ISTM that StandbyAcquireLock should still use a database OID and a relation OID instead of a only LOCKTAG, and SET_LOCKTAG_RELATION should be set in StandbyAcquireLock while ResolveRecoveryConflictWithLock is extended only with the lock mode as new argument. (Patch 2 adds many calls to SET_LOCKTAG_RELATION btw justidying to keep he API changes minimal). But there's now callers acquiring other locks than relation locks, like dbase_redo() acquiring a object lock. And we need to acquire those via the standby mechanism to avoid races around release. We could add a separate wrapper for relation locks, but imo the locktag move to the callers saved about as many lines in some places as it cost in others. In patch 2, isn't it necessary to bump XLOG_PAGE_MAGIC? I don't think so, there's no incompatible change. Thanks for having a look! Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v9
Il 02/02/15 22:28, Magnus Hagander ha scritto: On Mon, Feb 2, 2015 at 10:06 PM, Robert Haas robertmh...@gmail.com mailto:robertmh...@gmail.com wrote: On Sat, Jan 31, 2015 at 6:47 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it mailto:marco.nenciar...@2ndquadrant.it wrote: Il 31/01/15 17:22, Erik Rijkers ha scritto: On Sat, January 31, 2015 15:14, Marco Nenciarini wrote: 0001-public-parse_filename_for_nontemp_relation.patch 0002-copydir-LSN-v2.patch 0003-File-based-incremental-backup-v8.patch Hi, It looks like it only compiles with assert enabled. It is due to a typo (assert instead of Assert). You can find the updated patch attached to this message. I would sure like it if you would avoid changing the subject line every time you post a new version of this patch. It breaks the threading for me. +1 - it does break gmail. Ok, sorry for that. It seems to have also broken it for the CommitFest app, which thinks v3 is the last version. I was not able to attach the new version. The CF app has detected that it's the same thread, because of the headers (gmail is the buggy one here - the headers of the email are perfectly correct). It does not, however, pick up and show the change of subject there (but you can see if if you click the link for the latest version into the archives - the link under latest or latest attachment both go to the v9 patch). When I clicked on attach thread without having logged in, it took me to a bad URL. When I clicked on it after having logged in, it Clearly a bug. purported to work, but AFAICS, it didn't actually do anything. That's because the thread is already there, and you're adding it again. Of course, it wouldn't hurt if it actually told you that :) I'm also confused from the (Patch: No) part at the end of every line if you expand the last attachment line. Every message shown here contains one or more patch attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
[HACKERS] How about to have relnamespace and relrole?
Hello, Most of OID types has reg* OID types. Theses are very convenient when looking into system catalog/views, but there aren't OID types for userid and namespace id. What do you think about having these new OID types? The usefulness of regnamespace is doubtful but regrole should be useful because the column like 'owner' appears many places. For example this works as follows. SELECT relnamespace::regnamespace, relname, relowner::regrole FROM pg_class WHERE relnamespace = 'public'::regnamespace; relnamespace | relname | relowner --+-+-- public | t1 | horiguti (1 row) What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center From cd1137f940aa63c3597e76fd740b05778654d3bb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp Date: Wed, 21 Jan 2015 11:38:21 +0900 Subject: [PATCH 1/2] Add regnamespace. --- doc/src/sgml/datatype.sgml| 19 --- src/backend/utils/adt/regproc.c | 97 +++ src/include/catalog/pg_cast.h | 7 +++ src/include/catalog/pg_proc.h | 10 src/include/catalog/pg_type.h | 5 ++ src/include/utils/builtins.h | 5 ++ src/test/regress/expected/regproc.out | 22 src/test/regress/sql/regproc.sql | 4 ++ 8 files changed, 163 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index edf636b..9799e18 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4422,6 +4422,13 @@ SELECT * FROM pg_attribute /row row +entrytyperegnamespace//entry +entrystructnamepg_namespace//entry +entrynamespace(schema) name/entry +entryliteralpg_catalog//entry + /row + + row entrytyperegtype//entry entrystructnamepg_type//entry entrydata type name/entry @@ -4446,12 +4453,12 @@ SELECT * FROM pg_attribute /table para -All of the OID alias types accept schema-qualified names, and will -display schema-qualified names on output if the object would not -be found in the current search path without being qualified. -The typeregproc/ and typeregoper/ alias types will only -accept input names that are unique (not overloaded), so they are -of limited use; for most uses typeregprocedure/ or +All of the OID alias types except regnamespace accept schema-qualified +names, and will display schema-qualified names on output if the object +would not be found in the current search path without being qualified. +The typeregproc/ and typeregoper/ alias types will only accept +input names that are unique (not overloaded), so they are of limited use; +for most uses typeregprocedure/ or typeregoperator/ are more appropriate. For typeregoperator/, unary operators are identified by writing literalNONE/ for the unused operand. diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index 3d1bb32..b09aa2a 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -1159,6 +1159,103 @@ regclasssend(PG_FUNCTION_ARGS) /* + * regnamespacein - converts classname to class OID + * + * We also accept a numeric OID, for symmetry with the output routine. + * + * '-' signifies unknown (OID 0). In all other cases, the input must + * match an existing pg_class entry. + */ +Datum +regnamespacein(PG_FUNCTION_ARGS) +{ + char *nsp_name_or_oid = PG_GETARG_CSTRING(0); + Oid result = InvalidOid; + + /* '-' ? */ + if (strcmp(nsp_name_or_oid, -) == 0) + PG_RETURN_OID(InvalidOid); + + /* Numeric OID? */ + if (nsp_name_or_oid[0] = '0' + nsp_name_or_oid[0] = '9' + strspn(nsp_name_or_oid, 0123456789) == strlen(nsp_name_or_oid)) + { + result = DatumGetObjectId(DirectFunctionCall1(oidin, + CStringGetDatum(nsp_name_or_oid))); + PG_RETURN_OID(result); + } + + /* Else it's a name */ + + result = get_namespace_oid(nsp_name_or_oid, false); + + PG_RETURN_OID(result); +} + +/* + * to_regnamespace - converts nspname to namespace OID + * + * If the name is not found, we return NULL. + */ +Datum +to_regnamespace(PG_FUNCTION_ARGS) +{ + char *nsp_name = PG_GETARG_CSTRING(0); + Oid result; + + result = get_namespace_oid(nsp_name, true); + + if (OidIsValid(result)) + PG_RETURN_OID(result); + else + PG_RETURN_NULL(); +} + +/* + * regnamespaceout - converts namespace OID to nspname + */ +Datum +regnamespaceout(PG_FUNCTION_ARGS) +{ + Oid nspid = PG_GETARG_OID(0); + char *result; + + if (nspid == InvalidOid) + { + result = pstrdup(-); + PG_RETURN_CSTRING(result); + } + + result = get_namespace_name(nspid); + if (result) + PG_RETURN_CSTRING(result); + else + PG_RETURN_NULL(); +} + +/* + * regnamespacerecv - converts external binary format to regnamespace + */ +Datum +regnamespacerecv(PG_FUNCTION_ARGS) +{ + /* Exactly the same as oidrecv, so share code
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 01/30/2015 01:38 AM, Peter Geoghegan wrote: On the stress-testing front, I'm still running Jeff Janes' tool [1], while also continuing to use his Postgres modifications to artificially increase the XID burn rate. [1] https://github.com/petergeoghegan/jjanes_upsert I followed the instructions in README.md to reproduce this. I downloaded the tool, applied the upsert patchset, applied the hack to parse_clause.c as instructed in the README.md file, installed btree_gist, and ran count_upsert_exclusion.pl. It failed immediately with an assertion failure: TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex != ((Oid) 0)), File: nodeModifyTable.c, Line: 1619) Is that just because of the hack in parse_clause.c? With assertions disabled, count_upsert_exclusion.pl ran successfully to the end. I also tried running VACUUM FREEZE upsert_race_test in a loop in another session at the same time, but it didn't make a difference. How quickly do you see the errors? I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/03/2015 07:50 AM, Robert Haas wrote: On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: That's the whole point of this patch. max_checkpoint_segments = 10, or max_checkpoint_segments = 160 MB, means that the system will begin a checkpoint so that when the checkpoint completes, and it truncates away or recycles old WAL, the total size of pg_xlog is 160 MB. That's different from our current checkpoint_segments setting. With checkpoint_segments, the documented formula for calculating the disk usage is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's a lot less intuitive to set. Hmm, that's different from what I was thinking. We probably shouldn't call that max_checkpoint_segments, then. I got confused and thought you were just trying to decouple the number of segments that it takes to trigger a checkpoint from the number we keep preallocated. Wait, what? Because the new setting is an actual soft maximum, we *shouldn't* call it a maximum? Or are you saying something else? On 02/03/2015 04:25 AM, Heikki Linnakangas wrote: On 02/02/2015 04:21 PM, Andres Freund wrote: I think we need to increase checkpoint_timeout too - that's actually just as important for the default experience from my pov. 5 minutes often just unnecessarily generates FPWs en masse. I have yet to see any serious benchmarking on checkpoint_timeout. It does seem that for some workloads on some machines a longer timeout is better, but I've also seen workloads where a longer timeout decreases throughput or raises IO. So absent some hard numbers, I'd be opposed to changing the default. I'll open the bidding at 1600MB (aka 100). Fine with me. I wouldn't object to raising it a little bit, but that's way too high. It's entirely possible to have a small database that generates a lot of WAL. A table that has only a few rows, but is updated very very frequently, for example. And checkpointing such a database is quick too, so frequent checkpoints are not a problem. You don't want to end up with 1.5 GB of WAL on a 100 MB database. I suggest 192MB instead (12 segments). That almost doubles our current real default, without requiring huge disk space which might surprise some users. In practice, checkpoint_segments is impossible to automatically tune correctly. So let's be conservative. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers