Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 04/26/2014 09:27 PM, Andres Freund wrote: I don't think we need to decide this without benchmarks proving the benefits. I basically want to know whether somebody has an actual usecase - even if I really, really, can't think of one - of setting max_connections even remotely that high. If there's something fundamental out there that'd make changing the limit impossible, doing benchmarks wouldn't be worthwile. It doesn't seem unreasonable to have a database with tens of thousands of connections. Sure, performance will suffer, but if the connections sit idle most of the time so that the total load is low, who cares. Sure, you could use a connection pooler, but it's even better if you don't have to. If there are big gains to be had from limiting the number of connections, I'm not against it. For the purpose of shrinking BufferDesc though, I have feeling there might be other lower hanging fruit in there. For example, wait_backend_pid and freeNext are not used very often, so they could be moved elsewhere, to a separate array. And buf_id and the LWLock pointers could be calculated from the memory address of the struct. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On 04/27/2014 11:33 PM, Tom Lane wrote: I wrote: On closer inspection, the issue here is really that putting relpath.h/.c in common/ was completely misguided from the get-go. It's unnecessary: there's nothing outside the backend that uses it, except for contrib/pg_xlogdump which could very easily do without it. And relpath.h is a serious failure from a modularity standpoint anyway, because there is code all over the backend that has intimate familiarity with the pathname construction rules. We could possibly clean that up to the extent of being able to hide TABLESPACE_VERSION_DIRECTORY inside relpath.c, but what then? We'd still be talking about having CATALOG_VERSION_NO compiled into frontend code for any frontend code that actually made use of relpath.c, which is surely not such a great idea. So it seems to me the right fix for the relpath end of it is to push most of relpath.c back where it came from, which I think was backend/catalog/. In short, what I'm proposing here is to revert commit a73018392636ce832b09b5c31f6ad1f18a4643ea, lock stock n barrel. According to the commit message, the point of that was to allow pg_xlogdump to use relpath(), but I do not see it doing so; and if it tried, I don't know where it would get an accurate value of CATALOG_VERSION_NO from. So I think that was just poorly thought out. What contrib/pg_xlogdump *is* using is the forkNames[] array, which it uses to print the fork-number field of a BkpBlock symbolically. I'm inclined to think that printing numerically is good enough there, and a lot more robust. Comments? If there's anyone who has a really good use-case for using relpath() from outside the backend, better speak up. I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes to physical files. It has quite intimate knowledge of the on-disk layout anyway, so it wouldn't be the end of the world to just copy-paste that code. But would rather not, of course. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)
On 2014-04-27 18:44:16 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Just for some clarity, that also happens with expressions like: WHERE ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, '_RETURN', NULL) ORDER BY ROW(ev_class, rulename, ev_action); Previously we wouldn't detoast ev_action here, but now we do. ... The extra detoast you're seeing in this example is caused by the ROW() in the ORDER BY. Which, as I said, is just bad SQL coding. Good point. I agree that this needs to get backpatched at some point. But people also get very upset if queries gets slower by a magnitude or two after a minor version upgrade. And this does have the potential to do that, no? They don't get nearly as upset as they do if the database loses their data. Yes, in an ideal world, we could fix this and not suffer any performance loss anywhere. But no such option is on the table, and waiting is not going to make one appear. What waiting *will* do is afford more opportunities for this bug to eat people's data. We make tradeoffs between performance and safety *all the time*. A new performance regression that has the potential of affecting a fair number of installations very well can be worse than a decade old correctness bug. A bug that only will hit in some specific usage scenarios and will only affect individual datums. And you *have* come up with an alternative approach. It might very well be inferior, but that certainly doesn't make this approach one without alternatives. Anyway, I've now voiced my doubts about immediate backpatching. Any other opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote: On 04/26/2014 09:27 PM, Andres Freund wrote: I don't think we need to decide this without benchmarks proving the benefits. I basically want to know whether somebody has an actual usecase - even if I really, really, can't think of one - of setting max_connections even remotely that high. If there's something fundamental out there that'd make changing the limit impossible, doing benchmarks wouldn't be worthwile. It doesn't seem unreasonable to have a database with tens of thousands of connections. Sure, performance will suffer, but if the connections sit idle most of the time so that the total load is low, who cares. Sure, you could use a connection pooler, but it's even better if you don't have to. 65k connections will be absolutely *disastrous* for performance because of the big PGPROC et al. I *do* think we have to make live easier for users here by supplying builtin pooling at some point, but that's just a separate feature. If there are big gains to be had from limiting the number of connections, I'm not against it. For the purpose of shrinking BufferDesc though, I have feeling there might be other lower hanging fruit in there. For example, wait_backend_pid and freeNext are not used very often, so they could be moved elsewhere, to a separate array. And buf_id and the LWLock pointers could be calculated from the memory address of the struct. The main reason I want to shrink it is that I want to make pin/unpin buffer lockless and all solutions I can come up with for that require flags to be in the same uint32 as the refcount. For performance it'd be beneficial if usagecount also fits in there. I agree that we can move a good part of BufferDesc into a separately indexed array. io_in_progress_lock, freeNext, wait_backend_id are imo good candidates. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] allowing VACUUM to be cancelled for conflicting locks
Hi. In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. One way to do this would be to add an allow_vacuum_cancel GUC that, if set, would cause VACUUM to set the (suitably renamed) PROC_IS_AUTOVACUUM flag on its own process. Another would be to add an extra option to the VACUUM command that enables this behaviour. The former has the advantage of being backwards-compatible with existing scripts that run manual VACUUM. The latter is arguably nicer. Any thoughts? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 04/28/2014 12:39 PM, Andres Freund wrote: On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote: On 04/26/2014 09:27 PM, Andres Freund wrote: I don't think we need to decide this without benchmarks proving the benefits. I basically want to know whether somebody has an actual usecase - even if I really, really, can't think of one - of setting max_connections even remotely that high. If there's something fundamental out there that'd make changing the limit impossible, doing benchmarks wouldn't be worthwile. It doesn't seem unreasonable to have a database with tens of thousands of connections. Sure, performance will suffer, but if the connections sit idle most of the time so that the total load is low, who cares. Sure, you could use a connection pooler, but it's even better if you don't have to. 65k connections will be absolutely *disastrous* for performance because of the big PGPROC et al. Well, often that's still good enough. The main reason I want to shrink it is that I want to make pin/unpin buffer lockless and all solutions I can come up with for that require flags to be in the same uint32 as the refcount. For performance it'd be beneficial if usagecount also fits in there. Would it be enough to put only some of the flags in the same uint32? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So why is EXPLAIN printing only *plan* time?
On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... and not, in particular, parse analysis or rewrite time? I think breaking those out would be a good idea. Especially rewrite time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 2014-04-28 13:32:45 +0300, Heikki Linnakangas wrote: On 04/28/2014 12:39 PM, Andres Freund wrote: On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote: On 04/26/2014 09:27 PM, Andres Freund wrote: I don't think we need to decide this without benchmarks proving the benefits. I basically want to know whether somebody has an actual usecase - even if I really, really, can't think of one - of setting max_connections even remotely that high. If there's something fundamental out there that'd make changing the limit impossible, doing benchmarks wouldn't be worthwile. It doesn't seem unreasonable to have a database with tens of thousands of connections. Sure, performance will suffer, but if the connections sit idle most of the time so that the total load is low, who cares. Sure, you could use a connection pooler, but it's even better if you don't have to. 65k connections will be absolutely *disastrous* for performance because of the big PGPROC et al. Well, often that's still good enough. That may be true for 2-4k max_connections, but 65k? That won't even *run*, not to speak of doing something, in most environments because of the number of processes required. Even making only 20k connections will probably crash your computer. The main reason I want to shrink it is that I want to make pin/unpin buffer lockless and all solutions I can come up with for that require flags to be in the same uint32 as the refcount. For performance it'd be beneficial if usagecount also fits in there. Would it be enough to put only some of the flags in the same uint32? It's probably possible, but would make things more complicated. For a feature nobody is ever going to use. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
On Fri, Apr 25, 2014 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote: Well, I've already had collisions with UUID-OSSP, in production, with only around 20 billion values. So clearly there aren't 122bits of true randomness in OSSP. I can't speak for other implementations because I haven't tried them. Interesting. The statistical chances of this happening should be approximately 4e-17. Are you certain that this was due to uuid-ossp and not an application bug? Can you say what kind of operating system and environment that was? I skimmed the sources of uuid-ossp 1.6.2 and it seems to be doing the right thing, using /dev/urandom or /dev/random on Unixes and CryptGenRandom on Windows. Barring any bugs, of course. However, if these fail for whatever reason (e.g. out of file descriptors), it falls back to libc random(), which is clearly broken. On Fri, Apr 25, 2014 at 6:18 PM, Greg Stark st...@mit.edu wrote: The difficulty lies not really in the PRNG implementation (which is hard but well enough understood that it's not much of an issue these days). The difficulty lies in obtaining enough entropy. There are ways of obtaining enough entropy and they are available. Obtaining enough entropy requires access to hardware devices which means a kernel system call. This is a solved problem in most environments, too. The kernel collects entropy from unpredictable events and then seeds a global CSPRNG with that. This collection happens always regardless of whether you request random numbers or not, so essentially comes for free. Applications can then request output from this CSPRNG. Reason being, this infrastructure is necessary for more critical tasks than generating UUIDs: pretty much all of cryptography requires random numbers. They also deplete the available entropy pool for other sources which may means they have security consequences. This only affects the Linux /dev/random, which is discouraged these days for that reason. Applications should use urandom instead. To my knowledge, there are no other operating systems that have this depletion behavior. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com Comments? If there's anyone who has a really good use-case for using relpath() from outside the backend, better speak up. I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes to physical files. It has quite intimate knowledge of the on-disk layout anyway, so it wouldn't be the end of the world to just copy-paste that code. But would rather not, of course. Isn't pg_rewind so low-level server-close that it needs tons of server headers anyway, including one that would still have relpath()? We are talking here about what headers pure client apps need. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On Mon, Apr 28, 2014 at 7:37 AM, Andres Freund and...@2ndquadrant.com wrote: Well, often that's still good enough. That may be true for 2-4k max_connections, but 65k? That won't even *run*, not to speak of doing something, in most environments because of the number of processes required. Even making only 20k connections will probably crash your computer. I'm of two minds on this topic. On the one hand, cat /proc/sys/kernel/pid_max on a Linux system I just tested (3.2.6) returns 65536, so we'll run out of PID space before we run out of 64k backends. On the other hand, that value can easily be increased to a few million via, e.g., sysctl -w kernel.pid_max=4194303, and I imagine that as machines continue to get bigger there will be more and more people wanting to do things like that. I think the fact that making 20k connections might crash your computer is an artifact of other problems that we really ought to also fix (like per-backend memory utilization, and lock contention on various global data structures) rather than baking it into more places. In PostgreSQL 25.3, perhaps we'll be able to run distributed PostgreSQL clusters that can service a million simultaneous connections across dozens of physical machines. Then again, there might not be much left of our current buffer manager by that point, so maybe what we decide right now isn't that relevant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Different behaviour of concate() and concate operator ||
Hi ALL, I need little help to understand, text concatenation. concat function and operator || have different behaviour, if any input is NULL. test=# select 'abc' || NULL; ?column? -- (1 row) test=# select concat('abc', NULL); concat abc (1 row) It has simple reason, concat operator || use textcat() function which is STRICT. my question is 1. Do we required textcat to be STRICT, why? 2. textcat() is used for concat operator ||, can't it possible using concat() function? Thanks in advance. Regards, Amul Sul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On 04/28/2014 03:29 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com Comments? If there's anyone who has a really good use-case for using relpath() from outside the backend, better speak up. I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes to physical files. It has quite intimate knowledge of the on-disk layout anyway, so it wouldn't be the end of the world to just copy-paste that code. But would rather not, of course. Isn't pg_rewind so low-level server-close that it needs tons of server headers anyway, including one that would still have relpath()? We are talking here about what headers pure client apps need. It knows how to decode WAL, similar to pg_xlogdump. And it knows about the data directory layout, in particular, how relfilenodes are mapped to physical files. Those are the low-level parts. So, it certainly needs some server headers, but I wouldn't call it tons. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Fri, Apr 18, 2014 at 11:46 AM, Greg Stark st...@mit.edu wrote: On Fri, Apr 18, 2014 at 4:14 PM, Robert Haas robertmh...@gmail.com wrote: I am a bit confused by this remark. In *any* circumstance when you evict you're incurring precisely one page fault I/O when the page is read back in. That doesn't mean that the choice of which page to evict is irrelevant. But you might be evicting a page that will be needed soon or one that won't be needed for a while. If it's not needed for a while you might be able to avoid many page evictions by caching a page that will be used several times. Sure. If all the pages currently in RAM are hot -- meaning they're hot enough that they'll be needed again before the page you're reading in -- then they're all equally bad to evict. Also true. But the problem is that it is very rarely, if ever, the case that all pages are *equally* hot. On a pgbench workload, for example, I'm very confident that while there's not really any cold data, the btree roots and visibility map pages are a whole lot hotter than a randomly-selected heap page. If you evict a heap page, you're going to need it back pretty quick, because it won't be long until the random-number generator again chooses a key that happens to be located on that page. But if you evict the root of the btree index, you're going to need it back *immediately*, because the very next query, no matter what key it's looking for, is going to need that page. I'm pretty sure that's a significant difference. I'm trying to push us away from the gut instinct that frequently used pages are important to cache and towards actually counting how many i/os we're saving. In the extreme it's possible to simulate any cache algorithm on a recorded list of page requests and count how many page misses it generates to compare it with an optimal cache algorithm. There's another issue, which Simon clued me into a few years back: evicting the wrong page can cause system-wide stalls. In the pgbench case, evicting a heap page will force the next process that chooses a random number that maps to a tuple on that page to wait for the page to be faulted back in. That's sad, but unless the scale factor is small compared to the number of backends, there will probably be only ONE process waiting. On the other hand, if we evict the btree root, within a fraction of a second, EVERY process that isn't already waiting on some other I/O will be waiting for that I/O to complete. The impact on throughput is much bigger in that case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 21, 2014 at 6:38 PM, Jim Nasby j...@nasby.net wrote: I feel that if there is no memory pressure, frankly it doesnt matter much about what gets out and what not. The case I am specifically targeting is when the clocksweep gets to move about a lot i.e. high memory pressure workloads. Of course, I may be totally wrong here. Well, there's either memory pressure or there isn't. If there isn't then it's all moot *because we're not evicting anything*. I don't think that's really true. A workload can fit within shared_buffers at some times and spill beyond it at others. Every time it fits within shared_buffers for even a short period of time, the reference count of any buffer that's not ice-cold goes to 5 and we essentially lose all knowledge of which buffers are relatively hotter. Then, when we spill out again, evictions are random. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shm_mq inconsistent behavior of SHM_MQ_DETACHED
On Tue, Apr 22, 2014 at 9:55 AM, Petr Jelinek p...@2ndquadrant.com wrote: I was playing with shm_mq and found a little odd behavior with detaching after sending messages. Following sequence behaves as expected (receiver gets 2 messages): P1 - set_sender P1 - attach P2 - set_receiver P2 - attach P1 - send P2 - receive P1 - send P1 - detach P2 - receive P2 - detach But if I do first receive after detach like in this sequence: P1 - set_sender P1 - attach P2 - set_receiver P2 - attach P1 - send P1 - send P1 - detach P2 - receive I get SHM_MQ_DETACHED on the receiver even though there are messages in the ring buffer. That's a bug. The reason for this behavior is that mqh_counterparty_attached is only set by shm_mq_receive. This does not seem to be consistent - I would either expect to get SHM_MQ_DETACHED always when other party has detached or always get all remaining messages that are in queue (and I would strongly prefer the latter). Maybe the shm_mq_get_bytes_written should be used to determine if there is something left for us to read in the receiver if we hit the !mqh_counterparty_attached code path with detached sender? That's probably not a good idea, because there could be just a partial message left in the buffer, if the sender died midway through writing it. I suspect that attacking the problem that way will lead to a bunch of nasty edge cases. I'm thinking that the problem is really revolves around shm_mq_wait_internal(). It returns true if the queue is attached but not detached, and false if either the detach has already happened, or if we establish via the background worker handle that it will never come. But in the case of receiving, we want to treat attached-then-detached as a success case, not a failure case. Can you see if the attached patch fixes it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c index c70f3bf..4f7dd9c 100644 --- a/src/backend/storage/ipc/shm_mq.c +++ b/src/backend/storage/ipc/shm_mq.c @@ -406,7 +406,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait) if (shm_mq_get_sender(mq) == NULL) return SHM_MQ_WOULD_BLOCK; } - else if (!shm_mq_wait_internal(mq, mq-mq_sender, mqh-mqh_handle)) + else if (!shm_mq_wait_internal(mq, mq-mq_sender, mqh-mqh_handle) + shm_mq_get_sender(mq) == NULL) { mq-mq_detached = true; return SHM_MQ_DETACHED; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different behaviour of concate() and concate operator ||
amul sul sul_a...@yahoo.co.in writes: concat function and operator || have different behaviour, if any input is NULL. The behavior of || is specified by the SQL standard, and it says (SQL99 6.27 string value expression general rule 2a): a) If either S1 or S2 is the null value, then the result of the concatenation is the null value. So no, we're not changing it to be more like concat(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
Tom Lane wrote: Can't completely get rid of #ifndef FRONTEND in palloc.h :-( pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h must be able to compile in a #define FRONTEND context. Hmm, I had this patch in an abandoned branch from long ago, which I think helped remove postgres.h from pg_controldata. I remembered it just now because of this commit message. Maybe it's useful to re-remove the #ifndef FRONTEND from palloc.h. It's not rebased to latest master and maybe even not complete; if people think this approach is worthwhile I can try and clean it up and proposely more seriously; LMK. (Also if people think it needs futher tweaks. I vaguely recall I didn't propose it back then because the set of stuff in the new header could be tweaked.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index d960bbc..de06a36 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -30,6 +30,7 @@ #include access/twophase.h #include access/xact.h #include access/xlog_internal.h +#include access/xlogproc.h #include access/xlogreader.h #include access/xlogutils.h #include catalog/catversion.h diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 0c178c5..313486e 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -22,6 +22,7 @@ #include access/xlog.h #include access/xlog_internal.h +#include access/xlogproc.h #include miscadmin.h #include postmaster/startup.h #include replication/walsender.h diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 82ef726..8f185aa 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -22,6 +22,7 @@ #endif #include access/htup_details.h +#include access/xlogproc.h #include bootstrap/bootstrap.h #include catalog/index.h #include catalog/pg_collation.h diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 5fb2d81..54a09f0 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -43,6 +43,7 @@ #include access/xlog.h #include access/xlog_internal.h +#include access/xlogproc.h #include libpq/pqsignal.h #include miscadmin.h #include pgstat.h diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 3ddd596..4bd4740 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -34,6 +34,7 @@ #include access/xlog.h #include access/xlog_internal.h +#include access/xlogproc.h #include libpq/pqsignal.h #include miscadmin.h #include postmaster/fork_process.h diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 15c2320..08491a4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -93,6 +93,7 @@ #include access/transam.h #include access/xlog.h +#include access/xlogproc.h #include bootstrap/bootstrap.h #include catalog/pg_control.h #include lib/ilist.h diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index 7ebf500..1ea9d6a 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -22,7 +22,7 @@ #include signal.h #include unistd.h -#include access/xlog.h +#include access/xlogproc.h #include libpq/pqsignal.h #include miscadmin.h #include postmaster/startup.h diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c index 8359da6..bdfe6ca 100644 --- a/src/backend/postmaster/walwriter.c +++ b/src/backend/postmaster/walwriter.c @@ -46,7 +46,7 @@ #include time.h #include unistd.h -#include access/xlog.h +#include access/xlogproc.h #include libpq/pqsignal.h #include miscadmin.h #include postmaster/walwriter.h diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 911a66b..60e6d8d 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -49,6 +49,7 @@ #include access/timeline.h #include access/transam.h #include access/xlog_internal.h +#include access/xlogproc.h #include libpq/pqformat.h #include libpq/pqsignal.h #include miscadmin.h diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 8427006..22a9248 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -23,6 +23,7 @@ #include access/htup_details.h #include access/sysattr.h #include access/xact.h +#include access/xlogproc.h #include catalog/catalog.h #include catalog/indexing.h #include catalog/namespace.h diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 98149fc..6a90657 100644 --- a/src/backend/utils/misc/guc.c +++
Re: [HACKERS] includedir_internal headers are not self-contained
Heikki Linnakangas hlinnakan...@vmware.com writes: On 04/28/2014 03:29 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes to physical files. Isn't pg_rewind so low-level server-close that it needs tons of server headers anyway, including one that would still have relpath()? We are talking here about what headers pure client apps need. It knows how to decode WAL, similar to pg_xlogdump. And it knows about the data directory layout, in particular, how relfilenodes are mapped to physical files. Those are the low-level parts. So, it certainly needs some server headers, but I wouldn't call it tons. I'm not even worried about which headers this program uses. What I'm worried about is that you've got CATALOG_VERSION_NO compiled into a non-server executable. Is that really such a great idea? Wouldn't it be better if pg_rewind did not depend on that? (Perhaps it should get the database's catalog version out of the pg_control file, for example.) In short, while I don't deny that there may be non-server programs that need to know about physical file paths, I do strongly doubt that relpath.h/.c in their current form are a good solution to that problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implied BETWEEN from join quals
On Tue, Apr 22, 2014 at 11:44 AM, Simon Riggs si...@2ndquadrant.com wrote: I was recently nudged to describe an optimisation of merge joins/sorts. Rather than decribe that, I've looked at the general case: There are some additional implications we may make when joining tables... a particularly interesting one is that SELECT * FROM Fact JOIN Dim on Fact.col = Dim.col can be rewritten as SELECT * FROM Fact JOIN Dim on Fact.col = Dim.col WHERE ( Fact.col BETWEEN (SELECT min(col) FROM Dim) AND (SELECT max(col) FROM Dim) ) AND ( Dim.col BETWEEN (SELECT min(col) FROM Fact) AND (SELECT max(col) FROM Fact) ) which also works similarly for anti/semi-joins. If we have indexes on A.col and B.col then these additional quals can be derived cheaply at run-time and could have an important effect on optimisation. Interesting. This can pretty obviously produce a big win if things break our way. But it'll take some effort do determine whether the range of possible values for the join column is narrow enough to make the inferred BETWEEN clause selective enough to be useful, and that effort will be wasted if it turns out that the answer is no. I can certainly think of times when this would have helped me, so I kind of like the idea in theory ... but I won't be surprised if it turns out to be possible to construct realistic test cases where trying to figure this out causes too much increase in planning time. Actually, it seems possible to me that this could end up winning even if it doesn't enable the use of an index scan rather than a sequential scan, because eliminating rows during the scan is typically much cheaper than eliminating them during the join step. But it also seems possible that it could lose heavily in that case, if the extra steps during the scan phase don't eliminate enough rows to reduce the join cost to a sufficient degree. I'm not really sure how it will work out on balance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
Robert Haas robertmh...@gmail.com writes: I think the fact that making 20k connections might crash your computer is an artifact of other problems that we really ought to also fix (like per-backend memory utilization, and lock contention on various global data structures) rather than baking it into more places. In PostgreSQL 25.3, perhaps we'll be able to run distributed PostgreSQL clusters that can service a million simultaneous connections across dozens of physical machines. Then again, there might not be much left of our current buffer manager by that point, so maybe what we decide right now isn't that relevant. Yeah. I think that useful use of 64K backends is far enough away that it shouldn't be a showstopper argument, assuming that we get something good in return for baking that into bufmgr. What I find much more worrisome about Andres' proposals is that he seems to be thinking that there are *no* other changes to the buffer headers on the horizon. That's untenable. And I don't want to be told that we can't improve the buffer management algorithms because adding another field would make the headers not fit in a cacheline. (For the same reason, I'm pretty unimpressed by the nearby suggestions that it'd be okay to put very tight limits on the number of bits in the buffer header flags or the usage count.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On 04/28/2014 04:51 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 04/28/2014 03:29 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes to physical files. Isn't pg_rewind so low-level server-close that it needs tons of server headers anyway, including one that would still have relpath()? We are talking here about what headers pure client apps need. It knows how to decode WAL, similar to pg_xlogdump. And it knows about the data directory layout, in particular, how relfilenodes are mapped to physical files. Those are the low-level parts. So, it certainly needs some server headers, but I wouldn't call it tons. I'm not even worried about which headers this program uses. What I'm worried about is that you've got CATALOG_VERSION_NO compiled into a non-server executable. Is that really such a great idea? Wouldn't it be better if pg_rewind did not depend on that? (Perhaps it should get the database's catalog version out of the pg_control file, for example.) Sure, that would be better. Although I don't have much hope to make it completely version-independent. At the moment, pg_rewind explicitly reads the control file (yeah, it knows about that too), and checks that the catalog version matches what pg_rewind was compiled with. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvements in alter_table.sgml
On Wed, Apr 23, 2014 at 3:46 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/04/15 15:27), Etsuro Fujita wrote: (2014/04/14 23:53), Robert Haas wrote: On Fri, Apr 11, 2014 at 5:00 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Attached is an updated version of the patch. I think the other changes deserve to be considered separately, and in particular I'm still not sure it's a good idea to document both OF type_name and type_name. I've agreed on that point, but I think apart from the others, the trivial typo should be corrected. Patch attached (doc-altertable-typo.patch). I noticed the description of index_name should also be corrected, because it is currently used not only in CLUSTER ON, but in ADD table_constraint_using_index. (It will also be used in REPLICA IDENTITY in 9.4.) Patch attached (doc-altertable-indexname.patch). Agreed. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 2014-04-28 10:03:58 -0400, Tom Lane wrote: What I find much more worrisome about Andres' proposals is that he seems to be thinking that there are *no* other changes to the buffer headers on the horizon. Err. I am not thinking that at all. I am pretty sure I never made that argument. The reason I want to limit the number of connections is it allows *both*, shrinking the size of BufferDescs due to less alignment padding *and* stuffing the refcount and flags into one integer. That's untenable. And I don't want to be told that we can't improve the buffer management algorithms because adding another field would make the headers not fit in a cacheline. I think we need to move some less frequently fields to a separate array to be future proof. Heikki suggested freeNext, wait_backend_pid I added io_in_progress_lock. We could theoretically replace buf_id by calculating it based on the BufferDescriptors array, but that's probably not a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)
Andres Freund and...@2ndquadrant.com writes: On 2014-04-27 18:44:16 -0400, Tom Lane wrote: They don't get nearly as upset as they do if the database loses their data. Yes, in an ideal world, we could fix this and not suffer any performance loss anywhere. But no such option is on the table, and waiting is not going to make one appear. What waiting *will* do is afford more opportunities for this bug to eat people's data. We make tradeoffs between performance and safety *all the time*. Really? I'm not aware of any case where we've left unrecoverable data corruption go unfixed. It's true that we've rejected fixing some cases where plpgsql code might transiently try to use a stale toast pointer --- but that's a lot different from losing stored data permanently. And you *have* come up with an alternative approach. No, I haven't. I experimented with a different approach, at Noah's insistence. But I did not and will not try to extend it to a complete solution, first because I'm unsure that a 100% solution can reasonably be reached that way, and second because that approach *also* entails performance penalties --- unavoidable ones, unlike this approach where people can modify their SQL code to avoid fetching unnecessary columns. If somebody *else* is sufficiently hell-bent on doing it that way that they will take responsibility for completing and back-patching that approach, I will stand aside and let them find out the downsides. Otherwise, I propose to commit and back-patch this version. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Thu, Apr 24, 2014 at 1:39 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 04/17/2014 08:35 AM, Craig Ringer wrote: On 04/17/2014 04:47 AM, Petr Jelinek wrote: Well the logging is just too spammy in general when it comes to dynamic bgworkers but that's easy to fix in the future, no need to make decisions for 9.4. Agreed - it's the *API* that we need sorted out for 9.4, and log output isn't something Pg tends to consider part of the API. However I really don't like that I have to exit with exit code 1, which is normally used as failure, if I want to shutdown my dynamic bgworker once it has finished the work. And this behavior is something we can set properly only once... As far as I can tell we have a couple of options: - Redefine what the exit codes mean so that exit 0 suppresses auto-restart and exits silently. Probably simplest. I'm now strongly in favour of this alternative. I've just noticed that the bgworker control interfaces do not honour bgw.bgw_restart_time = BGW_NEVER_RESTART if you exit with status zero. This means that it's not simply a problem where you can't say restart me if I crash, but not if I exit normally. You also can't even say never restart me at all. Because BGW_NEVER_RESTART seems to really mean BGW_NO_RESTART_ON_CRASH. This _needs_fixing before 9.4. It seems we have consensus on what to do about this, but what we haven't got is a patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvement to fdwhandler.sgml
On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: The patch attached improves docs in fdwhandler.sgml a little bit. When you submit a patch, it's helpful to describe what the patch actually does, rather than just saying it makes things better. For example, I think that this patch could be described as in fdwhandler.sgml, mark references to scan_clauses with structfield tags. A problem with that idea is that scan_clauses is not a field in any struct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h must be able to compile in a #define FRONTEND context. Hmm, I had this patch in an abandoned branch from long ago, which I think helped remove postgres.h from pg_controldata. I remembered it just now because of this commit message. Maybe it's useful to re-remove the #ifndef FRONTEND from palloc.h. Hm. It would certainly be better if pg_controldata could use postgres_fe.h not postgres.h, but I'm confused about how the new header added by this patch helps that? None of the declarations you removed from xlog.h look like they'd be more problematic than the ones you left behind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h must be able to compile in a #define FRONTEND context. Hmm, I had this patch in an abandoned branch from long ago, which I think helped remove postgres.h from pg_controldata. I remembered it just now because of this commit message. Maybe it's useful to re-remove the #ifndef FRONTEND from palloc.h. Hm. It would certainly be better if pg_controldata could use postgres_fe.h not postgres.h, but I'm confused about how the new header added by this patch helps that? None of the declarations you removed from xlog.h look like they'd be more problematic than the ones you left behind. The point IIRC was to be able to remove stuff that required type Datum to be defined: -extern void ShutdownXLOG(int code, Datum arg); which led to the idea that the new header would be about process control for xlog.c. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
Andres Freund and...@2ndquadrant.com writes: On 2014-04-28 10:03:58 -0400, Tom Lane wrote: What I find much more worrisome about Andres' proposals is that he seems to be thinking that there are *no* other changes to the buffer headers on the horizon. Err. I am not thinking that at all. I am pretty sure I never made that argument. The reason I want to limit the number of connections is it allows *both*, shrinking the size of BufferDescs due to less alignment padding *and* stuffing the refcount and flags into one integer. Weren't you saying you also wanted to stuff the usage count into that same integer? That's getting a little too tight for my taste, even if it would fit today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Hm. It would certainly be better if pg_controldata could use postgres_fe.h not postgres.h, but I'm confused about how the new header added by this patch helps that? None of the declarations you removed from xlog.h look like they'd be more problematic than the ones you left behind. The point IIRC was to be able to remove stuff that required type Datum to be defined: -extern void ShutdownXLOG(int code, Datum arg); which led to the idea that the new header would be about process control for xlog.c. Oh, Datum was the issue? I see. But then this approach is basically requiring that we can never reference Datum in any header that pg_controldata uses. Which seems like a pretty draconian limitation, particularly if the headers involved are general-purpose ones like xlog.h. What might be less breakable is to identify exactly which declarations pg_controldata needs out of this file and push just those into some new header. However, pg_controldata is just the tip of the iceberg --- to get any mileage out of this, we'd also have to make pg_resetxlog and pg_xlogdump able to compile with only postgres_fe.h, so there might be too much stuff involved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Heikki Linnakangas hlinnakan...@vmware.com writes: On 04/28/2014 04:51 PM, Tom Lane wrote: I'm not even worried about which headers this program uses. What I'm worried about is that you've got CATALOG_VERSION_NO compiled into a non-server executable. Is that really such a great idea? Wouldn't it be better if pg_rewind did not depend on that? (Perhaps it should get the database's catalog version out of the pg_control file, for example.) Sure, that would be better. Although I don't have much hope to make it completely version-independent. At the moment, pg_rewind explicitly reads the control file (yeah, it knows about that too), and checks that the catalog version matches what pg_rewind was compiled with. ... which might or might not be the same one that libpgcommon was compiled with, no? I don't think you're really protecting yourself against version skew that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So why is EXPLAIN printing only *plan* time?
Robert Haas robertmh...@gmail.com writes: On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... and not, in particular, parse analysis or rewrite time? I think breaking those out would be a good idea. Especially rewrite time. Rewrite time seems generally negligible in comparison to the other two components, at least in the simple testing I did yesterday. It would only be significant if you were expanding some complicated views, in which case planning time would almost surely dominate anyway. Anyway, I'm starting to come to the conclusion that the idea of silently adding parse/rewrite time into the planning time line isn't such a good one. So there may or may not be sufficient interest in the other numbers to justify adding them as separate lines later --- but the key word there is later. I now think we should leave planning time as it's currently defined, which means we don't need to address this issue for 9.4. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Decrease MAX_BACKENDS to 2^16
On 2014-04-28 10:57:12 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-28 10:03:58 -0400, Tom Lane wrote: What I find much more worrisome about Andres' proposals is that he seems to be thinking that there are *no* other changes to the buffer headers on the horizon. Err. I am not thinking that at all. I am pretty sure I never made that argument. The reason I want to limit the number of connections is it allows *both*, shrinking the size of BufferDescs due to less alignment padding *and* stuffing the refcount and flags into one integer. Weren't you saying you also wanted to stuff the usage count into that same integer? That's getting a little too tight for my taste, even if it would fit today. That's a possible additional optimization that we could use. But it's certainly not required. Would allow us to use fewer atomic operations... Right now there'd be enough space for a more precise usagecount and more flags. ATM there's 9 bits for flags and 3 bits of usagecount... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
Tom Lane wrote: What might be less breakable is to identify exactly which declarations pg_controldata needs out of this file and push just those into some new header. Yeah, I also thought about that but didn't get around to trying. However, pg_controldata is just the tip of the iceberg --- to get any mileage out of this, we'd also have to make pg_resetxlog and pg_xlogdump able to compile with only postgres_fe.h, so there might be too much stuff involved. IIRC we didn't have pg_xlogdump when I wrote this, and I checked pg_resetxlog and it wasn't nearly as simple as pg_controldata. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So why is EXPLAIN printing only *plan* time?
On Mon, Apr 28, 2014 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... and not, in particular, parse analysis or rewrite time? I think breaking those out would be a good idea. Especially rewrite time. Rewrite time seems generally negligible in comparison to the other two components, at least in the simple testing I did yesterday. It would only be significant if you were expanding some complicated views, in which case planning time would almost surely dominate anyway. Anyway, I'm starting to come to the conclusion that the idea of silently adding parse/rewrite time into the planning time line isn't such a good one. So there may or may not be sufficient interest in the other numbers to justify adding them as separate lines later --- but the key word there is later. I now think we should leave planning time as it's currently defined, which means we don't need to address this issue for 9.4. Works for me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 04/28/2014 04:51 PM, Tom Lane wrote: I'm not even worried about which headers this program uses. What I'm worried about is that you've got CATALOG_VERSION_NO compiled into a non-server executable. Is that really such a great idea? Wouldn't it be better if pg_rewind did not depend on that? (Perhaps it should get the database's catalog version out of the pg_control file, for example.) Sure, that would be better. Although I don't have much hope to make it completely version-independent. At the moment, pg_rewind explicitly reads the control file (yeah, it knows about that too), and checks that the catalog version matches what pg_rewind was compiled with. ... which might or might not be the same one that libpgcommon was compiled with, no? I don't think you're really protecting yourself against version skew that way. The CATALOG_VERSION dependency in that code is a mistake which I didn't notice back then. I can't put too much thought into this issue at this time, but printing fork numbers rather than names seems pretty user-unfriendly to me. Rather than a revert of the whole patch I would hope for some different solution, if possible, though I can't offer anything right now. I don't think it's very likely that we would renumber forks; so the only possible problem would be that pg_rewind is linked with an older libpgcommon than the server which doesn't know some newer fork name/number and fails to produce a correct result. But ISTM we can rightly consider that as pilot error, right? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implied BETWEEN from join quals
On 28 April 2014 15:01, Robert Haas robertmh...@gmail.com wrote: Interesting. This can pretty obviously produce a big win if things break our way. But it'll take some effort do determine whether the range of possible values for the join column is narrow enough to make the inferred BETWEEN clause selective enough to be useful, and that effort will be wasted if it turns out that the answer is no. I can certainly think of times when this would have helped me, so I kind of like the idea in theory ... but I won't be surprised if it turns out to be possible to construct realistic test cases where trying to figure this out causes too much increase in planning time. Actually, it seems possible to me that this could end up winning even if it doesn't enable the use of an index scan rather than a sequential scan, because eliminating rows during the scan is typically much cheaper than eliminating them during the join step. But it also seems possible that it could lose heavily in that case, if the extra steps during the scan phase don't eliminate enough rows to reduce the join cost to a sufficient degree. I'm not really sure how it will work out on balance. I agree its not always a win and that the planning overhead is a little high. That's why I suggest we only attempt to calculate such a plan if the normal plan for the scan is already high. That way the extra planning is worth it. I'm sure there are other optimizations that we might only wish to consider for plans that are otherwise high cost. This need hardly touch the main planning path for shirt queries at all. The extra run-time cost of the test adds 2*cpu_operator_cost to the scan. However, it will save at least (OldSelectivity - NewSelectivity)*(cpu_tuple_cost + cpu_operator_cost) in the join node directly above it (and possible more above that - though without being able to inspect the join tree we won't know that until later - and the output plan might actually chage the later join plan anyway). So we can calculate the threshold at which we should choose the modified plan, though it will likely be a conservative value. The new selectivity can be calculated just as other selectivities, I see no problem there. So we have a proposed way to limit planning time in cases where the benefit would be low. Plus we have a way to estimate the reduction in cost once we add the new tests. The real value is whether we can use the additional tests to pick up an index we hadn't been able to use before. Given that MinMax index use will be a win even for fairly high selectivities (because of the low cost of index access), it looks like this optimization will play very nicely with MinMax. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. As much as I'm a fan of autovacuum, that's not always possible. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: ... which might or might not be the same one that libpgcommon was compiled with, no? I don't think you're really protecting yourself against version skew that way. The CATALOG_VERSION dependency in that code is a mistake which I didn't notice back then. I can't put too much thought into this issue at this time, but printing fork numbers rather than names seems pretty user-unfriendly to me. Rather than a revert of the whole patch I would hope for some different solution, if possible, though I can't offer anything right now. I think it would be okay to have a common/ module that encapsulates fork names/numbers. It's relpath() and particularly TABLESPACE_VERSION_DIRECTORY that bother me from a dependency standpoint. As far as pg_xlogdump goes, I agree that symbolic fork names are probably nice, but I think the case for printing db/ts/rel OIDs as a pathname is a whole lot weaker --- to my taste, that's actually an anti-feature. So I wouldn't mind dropping that dependency on relpath. Not sure what to do for Heikki's pg_rewind, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: ... which might or might not be the same one that libpgcommon was compiled with, no? I don't think you're really protecting yourself against version skew that way. The CATALOG_VERSION dependency in that code is a mistake which I didn't notice back then. I can't put too much thought into this issue at this time, but printing fork numbers rather than names seems pretty user-unfriendly to me. Rather than a revert of the whole patch I would hope for some different solution, if possible, though I can't offer anything right now. I think it would be okay to have a common/ module that encapsulates fork names/numbers. It's relpath() and particularly TABLESPACE_VERSION_DIRECTORY that bother me from a dependency standpoint. As far as pg_xlogdump goes, I agree that symbolic fork names are probably nice, but I think the case for printing db/ts/rel OIDs as a pathname is a whole lot weaker --- to my taste, that's actually an anti-feature. I might be missing something, but, why? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planned downtime @ Rackspace
All, Unfortunately, Rackspace has had to post-pone this pending finalization of the switch configuration. We'll send out an update when we have a (new) finalized downtime window. Thanks, Stephen * Stephen Frost (sfr...@snowman.net) wrote: Greetings, As some may be aware, we are currently working with Rackspace to upgrade the PostgreSQL infrastructure systems which they graciously host for us. As part of these upgrades there will be downtime for systems hosted there. Our first planned downtime will be for ~ 2 hours on Monday, April 28th, between 1500-2000 UTC (11am-4pm US/Eastern). We have requested that the outage be towards the end of that window (1800-2000 UTC), but there is no guarantee that they'll be able to support that request. Clearly, we hope the actual downtime will be less than 2 hours. This downtime will impact all systems hosted @ Rackspace as they are upgrading the switch for us. End-user services which will be impacted include: yum.postgresql.org wiki.postgresql.org git master server (Committers only) planet.postgresql.org www.pgadmin.org, ftp.pgadmin.org media.postgresql.org commitfest.postgresql.org developer.postgresql.org (Developer personal homepages) redmine.postgresql.org jdbc.postgresql.org postgresopen.org developer.pgadmin.org (sandbox, Jenkins) babel.postgresql.org (Translation services) Redundant services (minimal impact expected): ns2.postgresql.org (other nameservers will still work) .us inbound mail relay (other MXs will still work) .us website front-end (should be removed from pool) FTP mirror (should be removed from pool) We anticipate this being the only whole-environment outage during this migration. Future outages will happen for individual systems as we migrate them to the new hardware. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Different behaviour of concate() and concate operator ||
The missing bit of context is that concat() is there because early on in Postgres's life there was an effort to have a full suite of Oracle compatibility functions. If someone suggested it now they would be pushed towards making it an extension or pointed at EDB. But things like concat are the remnants of that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Robert Haas robertmh...@gmail.com writes: On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: As far as pg_xlogdump goes, I agree that symbolic fork names are probably nice, but I think the case for printing db/ts/rel OIDs as a pathname is a whole lot weaker --- to my taste, that's actually an anti-feature. I might be missing something, but, why? It's more verbose, it's not actually any more information, and in many cases it's actively misleading, because what's printed is NOT the real file name --- it omits segment numbers for instance. As a particularly egregious example, in xact_desc_commit() we print a pathname including MAIN_FORKNUM, which is a flat out lie to the reader, because what will actually get deleted is all forks. So yeah, it's an anti-feature. BTW, for the same reasons I'm less than impressed with the uses of relpath in error messages in, eg, reorderbuffer.c: relation = RelationIdGetRelation(reloid); if (relation == NULL) elog(ERROR, could open relation descriptor %s, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); Printing anything other than the relation OID here is irrelevant, misleading, and inconsistent with our practice everywhere else. Let's not even mention the missing not in the message text. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different behaviour of concate() and concate operator ||
Greg Stark st...@mit.edu writes: The missing bit of context is that concat() is there because early on in Postgres's life there was an effort to have a full suite of Oracle compatibility functions. If someone suggested it now they would be pushed towards making it an extension or pointed at EDB. But things like concat are the remnants of that. Well, that's historical revisionism, because concat() was added in 9.1. But if it was defined this way for Oracle compatibility, that makes sense, because Oracle doesn't distinguish NULL from empty strings. So they pretty much would have to make concat() treat NULL as empty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. As much as I'm a fan of autovacuum, that's not always possible. Or even recommended, unless the docs changed radically in the last couple of weeks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On 2014-04-28 09:54:49 -0400, Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. E.g. manually scheduling the full table vacuums to happen during low activity periods is a very good idea on busy servers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
Claudio Freire klaussfre...@gmail.com writes: On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. As much as I'm a fan of autovacuum, that's not always possible. Or even recommended, unless the docs changed radically in the last couple of weeks. Actually, having just looked at the code in question, I think this whole thread is based on an obsolete assumption. AFAICS, since commit b19e4250b manual vacuum behaves exactly like autovacuum as far as getting kicked off the exclusive lock is concerned. There's certainly not any tests for autovacuum in lazy_truncate_heap() today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On Mon, Apr 28, 2014 at 6:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. Autovac is also going to promote itself to uninterruptible once every 150e6 transactions (by the default settings). To stop using manual vacuums is not going to be a complete cure anyway. It would be nice to know why the scripts are doing the manual vacuum. Just out of mythology, or is there an identifiable reason? Cheers, Jeff
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On 2014-04-28 13:58:10 -0400, Tom Lane wrote: Claudio Freire klaussfre...@gmail.com writes: On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. As much as I'm a fan of autovacuum, that's not always possible. Or even recommended, unless the docs changed radically in the last couple of weeks. Actually, having just looked at the code in question, I think this whole thread is based on an obsolete assumption. AFAICS, since commit b19e4250b manual vacuum behaves exactly like autovacuum as far as getting kicked off the exclusive lock is concerned. There's certainly not any tests for autovacuum in lazy_truncate_heap() today. I don't think this is about the truncation thing, but about the deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a autovacuum is cancelled if user code tries to acquire a conflicting lock. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On Mon, Apr 28, 2014 at 10:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: In the past, we've had situations where everything is hung turned out to be because of a script that ran manual VACUUM that was holding some lock. It's admittedly not a huge problem, but it might be useful if a manual VACUUM could be cancelled the way autovacuum can be. I think the real answer to that is stop using manual VACUUM. As much as I'm a fan of autovacuum, that's not always possible. Or even recommended, unless the docs changed radically in the last couple of weeks. Actually, having just looked at the code in question, I think this whole thread is based on an obsolete assumption. AFAICS, since commit b19e4250b manual vacuum behaves exactly like autovacuum as far as getting kicked off the exclusive lock is concerned. There's certainly not any tests for autovacuum in lazy_truncate_heap() today. I assumed he was a talking about the SHARE UPDATE EXCLUSIVE used during the main work, not the ACCESS EXCLUSIVE one used during truncation. Cheers, Jeff
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
Andres Freund and...@2ndquadrant.com writes: I don't think this is about the truncation thing, but about the deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a autovacuum is cancelled if user code tries to acquire a conflicting lock. It's a bit of a stretch to claim that a manual VACUUM should be cancelled by a manual DDL action elsewhere. Who's to say which of those things should have priority? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On 2014-04-28 14:05:04 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I don't think this is about the truncation thing, but about the deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a autovacuum is cancelled if user code tries to acquire a conflicting lock. It's a bit of a stretch to claim that a manual VACUUM should be cancelled by a manual DDL action elsewhere. Who's to say which of those things should have priority? Yea, I am not that sure about the feature either. It sure would need to be optional. Often enough VACUUMs are scripted to run during off hours, for those it possibly makes sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks
On Mon, Apr 28, 2014 at 11:05 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: I don't think this is about the truncation thing, but about the deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a autovacuum is cancelled if user code tries to acquire a conflicting lock. It's a bit of a stretch to claim that a manual VACUUM should be cancelled by a manual DDL action elsewhere. Who's to say which of those things should have priority? The proposal was to add either a GUC, or a syntax to the vacuum command, so it would be either DBA or the invoker of the vacuum which is the one to say. Either one does seem a reasonable place to have such a say, although perhaps not worth the effort to implement. Cheers, Jeff
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 28, 2014 at 6:02 AM, Robert Haas robertmh...@gmail.com wrote: Also true. But the problem is that it is very rarely, if ever, the case that all pages are *equally* hot. On a pgbench workload, for example, I'm very confident that while there's not really any cold data, the btree roots and visibility map pages are a whole lot hotter than a randomly-selected heap page. If you evict a heap page, you're going to need it back pretty quick, because it won't be long until the random-number generator again chooses a key that happens to be located on that page. But if you evict the root of the btree index, you're going to need it back *immediately*, because the very next query, no matter what key it's looking for, is going to need that page. I'm pretty sure that's a significant difference. I emphasized leaf pages because even with master the root and inner pages are still going to be so hot as to make them constantly in cache, at least with pgbench's use of a uniform distribution. You'd have to have an absolutely enormous scale factor before this might not be the case. As such, I'm not all that worried about inner pages when performing these simple benchmarks. However, in the case of the pgbench_accounts table, each of the B-Tree leaf pages that comprise about 99.5% of the total is still going to be about six times more frequently accessed than each heap page. That's a small enough difference for it to easily go unappreciated, and yet a big enough difference for it to hurt a lot. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: As far as pg_xlogdump goes, I agree that symbolic fork names are probably nice, but I think the case for printing db/ts/rel OIDs as a pathname is a whole lot weaker --- to my taste, that's actually an anti-feature. I might be missing something, but, why? It's more verbose, it's not actually any more information, and in many cases it's actively misleading, because what's printed is NOT the real file name --- it omits segment numbers for instance. As a particularly egregious example, in xact_desc_commit() we print a pathname including MAIN_FORKNUM, which is a flat out lie to the reader, because what will actually get deleted is all forks. Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty handy. If you format it some other way it's annoying to reformat it. So yeah, it's an anti-feature. BTW, for the same reasons I'm less than impressed with the uses of relpath in error messages in, eg, reorderbuffer.c: relation = RelationIdGetRelation(reloid); if (relation == NULL) elog(ERROR, could open relation descriptor %s, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); Printing anything other than the relation OID here is irrelevant, misleading, and inconsistent with our practice everywhere else. Let's not even mention the missing not in the message text. Uggh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Fri, Apr 25, 2014 at 10:45 AM, Peter Geoghegan p...@heroku.com wrote: I've now done a non-limited comparative benchmark of master against the patch (once again, with usage_count starting at 6, and BM_MAX_USAGE_COUNT at 30) with a Gaussian distribution. Once again, the distribution threshold used was consistently 5.0, causing the patched pgbench to report for each test: transaction type: Custom query scaling factor: 5000 standard deviation threshold: 5.0 access probability of top 20%, 10% and 5% records: 0.68269 0.38293 0.19741 Results are available from: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay-gauss/ I updated this with various changes in bgwriter configuration. Perhaps unsurprisingly, disabling the background writer entirely helps for both master and patched. It is perhaps notable that the largest difference between two comparable patch + master test runs is seen when the background writer is disabled entirely, and 32 clients. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Robert Haas robertmh...@gmail.com writes: On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's more verbose, it's not actually any more information, and in many cases it's actively misleading, because what's printed is NOT the real file name --- it omits segment numbers for instance. As a particularly egregious example, in xact_desc_commit() we print a pathname including MAIN_FORKNUM, which is a flat out lie to the reader, because what will actually get deleted is all forks. Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty handy. If you format it some other way it's annoying to reformat it. Handy for what? How often do you need to do that? (And if you do do it, how often will you remember that the filename is only approximate?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On 2014-04-28 14:44:16 -0400, Robert Haas wrote: BTW, for the same reasons I'm less than impressed with the uses of relpath in error messages in, eg, reorderbuffer.c: relation = RelationIdGetRelation(reloid); if (relation == NULL) elog(ERROR, could open relation descriptor %s, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); Printing anything other than the relation OID here is irrelevant, misleading, and inconsistent with our practice everywhere else. Let's not even mention the missing not in the message text. Uggh. I'll send a fix once I am home (~3h). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On Mon, Apr 28, 2014 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's more verbose, it's not actually any more information, and in many cases it's actively misleading, because what's printed is NOT the real file name --- it omits segment numbers for instance. As a particularly egregious example, in xact_desc_commit() we print a pathname including MAIN_FORKNUM, which is a flat out lie to the reader, because what will actually get deleted is all forks. Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty handy. If you format it some other way it's annoying to reformat it. Handy for what? How often do you need to do that? (And if you do do it, how often will you remember that the filename is only approximate?) *shrug* I think if you're looking at the output of xact_desc_commit() and you *don't* know that the filenames are approximate (or at least that you should Use The Source, Luke) then you're probably in way over your head. I have to admit it's been a few years since I've had to play with WAL_DEBUG, so I don't really remember what I was trying to do. But I don't see any real argument that three slash-separated numbers will be more useful to somebody who has to dig through this than a pathname, even an approximate pathname, and I think people wanting to figure out approximately where they need to look to find the data affected by the WAL record will be pretty common among people decoding WAL records. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Robert Haas robertmh...@gmail.com writes: I have to admit it's been a few years since I've had to play with WAL_DEBUG, so I don't really remember what I was trying to do. But I don't see any real argument that three slash-separated numbers will be more useful to somebody who has to dig through this than a pathname, even an approximate pathname, and I think people wanting to figure out approximately where they need to look to find the data affected by the WAL record will be pretty common among people decoding WAL records. Meh. I still think it's a bad idea to have CATALOG_VERSION_NO getting compiled into libpgcommon.a, where there will be no way to cross-check that it matches anything. But I guess I'm losing this argument. However, we've absolutely got to get that reference out of common/relpath.h. The usage of RelFileNode there is problematic as well. How about we change common/relpath.[hc] to export a single version of relpath() that takes its arguments as separate plain OIDs, and then make the other versions wrappers that are only declared in some backend header? The wrappers could possibly be macros, allowing things like pg_xlogdump to keep using them as long as they didn't mind importing backend headers. (Though for the RelFileNode case this would imply multiple evaluation of the macro argument, so maybe it's not such a great idea.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 28/04/14 16:27, Robert Haas wrote: On Thu, Apr 24, 2014 at 1:39 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 04/17/2014 08:35 AM, Craig Ringer wrote: I've just noticed that the bgworker control interfaces do not honour bgw.bgw_restart_time = BGW_NEVER_RESTART if you exit with status zero. This means that it's not simply a problem where you can't say restart me if I crash, but not if I exit normally. You also can't even say never restart me at all. Because BGW_NEVER_RESTART seems to really mean BGW_NO_RESTART_ON_CRASH. This _needs_fixing before 9.4. It seems we have consensus on what to do about this, but what we haven't got is a patch. If you mean the consensus that exit status 0 should mean permanent stop then I think the patch can be as simple as attached. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index b573fd8..be211f0 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2848,7 +2848,10 @@ CleanupBackgroundWorker(int pid, if (!EXIT_STATUS_0(exitstatus)) rw-rw_crashed_at = GetCurrentTimestamp(); else + { rw-rw_crashed_at = 0; + rw-rw_terminate = true; + } /* * Additionally, for shared-memory-connected workers, just like a -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shm_mq inconsistent behavior of SHM_MQ_DETACHED
On 28/04/14 15:36, Robert Haas wrote: On Tue, Apr 22, 2014 at 9:55 AM, Petr Jelinek p...@2ndquadrant.com wrote: But if I do first receive after detach like in this sequence: P1 - set_sender P1 - attach P2 - set_receiver P2 - attach P1 - send P1 - send P1 - detach P2 - receive I get SHM_MQ_DETACHED on the receiver even though there are messages in the ring buffer. That's a bug. I'm thinking that the problem is really revolves around shm_mq_wait_internal(). It returns true if the queue is attached but not detached, and false if either the detach has already happened, or if we establish via the background worker handle that it will never come. But in the case of receiving, we want to treat attached-then-detached as a success case, not a failure case. Can you see if the attached patch fixes it? Yes, the patch fixes it for me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
On 2014-04-28 13:20:47 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: As far as pg_xlogdump goes, I agree that symbolic fork names are probably nice, but I think the case for printing db/ts/rel OIDs as a pathname is a whole lot weaker --- to my taste, that's actually an anti-feature. I might be missing something, but, why? It's more verbose, it's not actually any more information, and in many cases it's actively misleading, because what's printed is NOT the real file name --- it omits segment numbers for instance. As a particularly egregious example, in xact_desc_commit() we print a pathname including MAIN_FORKNUM, which is a flat out lie to the reader, because what will actually get deleted is all forks. So yeah, it's an anti-feature. BTW, for the same reasons I'm less than impressed with the uses of relpath in error messages in, eg, reorderbuffer.c: relation = RelationIdGetRelation(reloid); if (relation == NULL) elog(ERROR, could open relation descriptor %s, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); Printing anything other than the relation OID here is irrelevant, misleading, and inconsistent with our practice everywhere else. I don't think it's really comparable to the other scenarios. We should print the oid, just as relation_open() does, but the filenode is also rather helpful here. How about the attached? If we had a version of relpath() that didn't require the fsm, I'd use it. If you prefer, I'd be happy enough to replace it with spcNode/dbNode/relNode. That's more than sufficient here. Let's not even mention the missing not in the message text. That's clearly wrong. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 4493930..0b9731b 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1362,14 +1362,14 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid, change-data.tp.oldtuple == NULL) continue; else if (reloid == InvalidOid) - elog(ERROR, could not lookup relation %s, + elog(ERROR, could not map filenode %s to its oid, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); relation = RelationIdGetRelation(reloid); if (relation == NULL) - elog(ERROR, could open relation descriptor %s, - relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); + elog(ERROR, could not open relation with oid %u, filenode %s, + reloid, relpathperm(change-data.tp.relnode, MAIN_FORKNUM)); if (RelationIsLogicallyLogged(relation)) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Yeah. I'm still not exactly convinced that custom-scan will ever allow independent development of new plan types (which, with all due respect to Robert, is what it was being sold as last year in Ottawa). But I'm not opposed in principle to committing it, if we can find a way to have a cleaner API for things like setrefs.c. It seems like late-stage planner processing in general is an issue for this patch (createplan.c and subselect.c are also looking messy). EXPLAIN isn't too great either. I'm not sure exactly what to do about those cases, but I wonder whether things would get better if we had the equivalent of expression_tree_walker/mutator capability for plan nodes. The state of affairs in setrefs and subselect, at least, is a bit reminiscent of the bad old days when we had lots of different bespoke code for traversing expression trees. Hmm. If we have something like expression_tree_walker/mutator for plan nodes, we can pass a walker/mutator function's pointer instead of exposing static functions that takes recursive jobs. If custom-plan provider (that has sub-plans) got a callback with walker/ mutator pointer, all it has to do for sub-plans are calling this new plan-tree walking support routine with supplied walker/mutator. It seems to me more simple design than what I did. I tried to code the similar walker/mutator functions on plan-node tree, however, it was not available to implement these routines enough simple, because the job of walker/mutator functions are not uniform thus caller side also must have a large switch-case branches. I picked up setrefs.c for my investigation. The set_plan_refs() applies fix_scan_list() on the expression tree being appeared in the plan node if it is delivered from Scan, however, it also applies set_join_references() for subclass of Join, or set_dummy_tlist_references() for some other plan nodes. It implies that the walker/mutator functions of Plan node has to apply different operation according to the type of Plan node. I'm not certain how much different forms are needed. (In addition, set_plan_refs() performs usually like a walker, but often performs as a mutator if trivial subquery) I'm expecting the function like below. It allows to call plan_walker function for each plan-node and also allows to call expr_walker function for each expression-node on the plan node. bool plan_tree_walker(Plan *plan, bool (*plan_walker) (), bool (*expr_walker) (), void *context) I'd like to see if something other form to implement this routine. One alternative idea to give custom-plan provider a chance to handle its subplans is, to give function pointers (1) to handle recursion of plan-tree and (2) to set up backend's internal state. In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum necessity for extensions. It also kills necessity to export static functions. How about your thought? -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 4/28/14, 8:04 AM, Robert Haas wrote: On Mon, Apr 21, 2014 at 6:38 PM, Jim Nasby j...@nasby.net wrote: I feel that if there is no memory pressure, frankly it doesnt matter much about what gets out and what not. The case I am specifically targeting is when the clocksweep gets to move about a lot i.e. high memory pressure workloads. Of course, I may be totally wrong here. Well, there's either memory pressure or there isn't. If there isn't then it's all moot *because we're not evicting anything*. I don't think that's really true. A workload can fit within shared_buffers at some times and spill beyond it at others. Every time it fits within shared_buffers for even a short period of time, the reference count of any buffer that's not ice-cold goes to 5 and we essentially lose all knowledge of which buffers are relatively hotter. Then, when we spill out again, evictions are random. That's a separate problem, but yes, just because we're not evicting something doesn't mean we can end up with every buffer marked as equally important. OSes handle this by splitting pages between active and inactive, and maintaining a relative balance between the two (actually a bit more complex because there's a separate inactive/dirty pool). In our case this could maybe be handled by simply not incrementing counts when there's no eviction... but I'm more a fan of separate polls/clocks, because that means you can do things like a LFU for active and an LRU for inactive. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
Please fix this compiler warning. I think it came from this patch. print.c: In function ‘print_aligned_vertical’: print.c:1354:10: error: pointer targets in passing argument 1 of ‘strlen_max_width’ differ in signedness [-Werror=pointer-sign] encoding); ^ print.c:126:12: note: expected ‘unsigned char *’ but argument is of type ‘char *’ static int strlen_max_width(unsigned char *str, int *target_width, int encoding); ^ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
On Tue, Apr 29, 2014 at 12:37 PM, Sergey Muraviov sergey.k.murav...@gmail.com wrote: 2014-04-29 5:52 GMT+04:00 Peter Eisentraut pete...@gmx.net: Please fix this compiler warning. I think it came from this patch. print.c: In function 'print_aligned_vertical': print.c:1354:10: error: pointer targets in passing argument 1 of 'strlen_max_width' differ in signedness [-Werror=pointer-sign] encoding); ^ print.c:126:12: note: expected 'unsigned char *' but argument is of type 'char *' static int strlen_max_width(unsigned char *str, int *target_width, int encoding); ^ fixed Your patch has been committed by Greg not so long ago, you should send for this warning a patch rebased on the latest master branch commit :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
rebased 2014-04-29 7:43 GMT+04:00 Michael Paquier michael.paqu...@gmail.com: On Tue, Apr 29, 2014 at 12:37 PM, Sergey Muraviov sergey.k.murav...@gmail.com wrote: 2014-04-29 5:52 GMT+04:00 Peter Eisentraut pete...@gmx.net: Please fix this compiler warning. I think it came from this patch. print.c: In function 'print_aligned_vertical': print.c:1354:10: error: pointer targets in passing argument 1 of 'strlen_max_width' differ in signedness [-Werror=pointer-sign] encoding); ^ print.c:126:12: note: expected 'unsigned char *' but argument is of type 'char *' static int strlen_max_width(unsigned char *str, int *target_width, int encoding); ^ fixed Your patch has been committed by Greg not so long ago, you should send for this warning a patch rebased on the latest master branch commit :) -- Michael -- Best regards, Sergey Muraviov diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 0eb..b2f56a3 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -1350,8 +1350,7 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout) { int swidth, twidth = hwidth + 1; fputs(hline? format-header_nl_left: , fout); -strlen_max_width((char *) hlineptr[hline].ptr, twidth, - encoding); +strlen_max_width(hlineptr[hline].ptr, twidth, encoding); fprintf(fout, %-s, hlineptr[hline].ptr); swidth = hwidth - twidth; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different behaviour of concate() and concate operator ||
On Monday, 28 April 2014 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: So they pretty much would have to make concat() treat NULL as empty. could it be problematic if I update pg_operator catalogue entry for ||, call to concat() function instead of texcat(), in my production environment ? for eg. oprname | oprleft | oprright | oprresult | oprcode -+-+--+---+- || | 25 | 25 | 25 | concat (1 row) result would be test=# select 'abc' || NULL; ; ?column? -- abc (1 row) Thank for your help. Regards, Amul Sul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Merge Join for Non '=' Operators
On 10 April 2014 14:21, I wrote I shall perform some more test, for that I need to do some more hack in the code and I will post them soon.. Test Scenario: Create table t1 (a int, b int); Create table t2 (a int, b int); Random record inserted in t1 and t2, as per attached files. (10K records are inserted in both the tables) Performance is taken for the query : select count(*) from t1,t2 where t1.b t2.b; Test Result: Nest Loop Join :Time: 36038.842 ms Merge Join : Time: 19774.975 ms Number of record selected: 42291979 I have some more testing with index and multiple conditions.. Test Scenario: Create table t1 (a int, b int); Create table t2 (a int, b int); Create index t1_idx t1(b); Create index t1_idx t1(b); Query: select count(*) from t1,t2 where t1.bt2.b and t1.b 12000; Test Result: Nest Loop Join with Index Scan : 1653.506 ms Sort Merge Join for (seq scan) : 610.257ms From above both the scenario Sort merge join for operator is faster than NLJ (using seq scan or index scan). Any suggestion for other performance scenarios are welcome.. Thanks Regards, Dilip Kumar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers