Re: [HACKERS] pgbench internal contention
Robert Haas writes: > On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane wrote: >> Portability, or rather lack of it. What about using erand48, which we >> already have a dependency on (and substitute code for)? > Neither our implementation nor glibc's appears to be thread-safe, I think you're confusing srand48 with erand48. The latter relies on a caller-supplied state value, so if it's not thread-safe the caller has only itself to blame. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix for pg_update on win32
Based on EnterpriseDB testing, I have applied the attached patch to fix a pg_upgrade bug on Win32 on 9.1 and 9.2. The problem is that on Win32 you can't stat() a directory with a trailing slash --- this is already mentioned in our src/port/path.c. The patch removes a lone trailing slash. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c new file mode 100644 index ef21899..b632584 *** a/contrib/pg_upgrade/exec.c --- b/contrib/pg_upgrade/exec.c *** check_data_dir(const char *pg_data) *** 168,174 { struct stat statBuf; ! snprintf(subDirName, sizeof(subDirName), "%s/%s", pg_data, requiredSubdirs[subdirnum]); if (stat(subDirName, &statBuf) != 0) --- 168,176 { struct stat statBuf; ! snprintf(subDirName, sizeof(subDirName), "%s%s%s", pg_data, ! /* Win32 can't stat() a directory with a trailing slash. */ ! *requiredSubdirs[subdirnum] ? "/" : "", requiredSubdirs[subdirnum]); if (stat(subDirName, &statBuf) != 0) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
>> Comments and further feedback, if any, appreciated. > > Did you look at how this conflicts with my patch to add not null > rows to pg_constraint? > > https://commitfest.postgresql.org/action/patch_view?id=601 > I was certainly not aware of this patch in the commitfest. Your patch has a larger footprint with more functional changes in it. IMHO, it will be easiest to queue this non-inheritable constraints patch behind your patch in the commitfest. There will be certain bitrot, which I can fix once your patch gets committed. Regards, Nikhils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incremental checkopints
> If you make writes go out more often, they will be less efficient I think fsync is more important. But many writes + fsync is no good too. Let suppose that 30 WAL segments are good for performance (to be written at once). In incremental approach we can have 60 segments and we can write 30 at once. There is no checkpoint_timeout - more buffers will stay more time. I can not see any disadvantage. Jordan Ivanov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduced power consumption in autovacuum launcher process
Attached is revision of this patch that now treats the latch in PGPROC, waitLatch, as the generic "process latch", rather than just using it for sync rep; It is initialised appropriately as a shared latch generically, within InitProcGlobal(), and ownership is subsequently set within InitProcess(). We were doing so before, though only for the benefit of sync rep. The idea here is to have a per-process latch to guard against time-out invalidation issues from within each generic signal handler, by calling SetLatch() on our generic latch there, much as we already do from within non-generic archiver process signal handlers on the archiver's static, non-generic latch (the archiver has its MyProc pointer set to NULL, and we allow for the possibility that generic handlers may be registered within processes that have a NULL MyProc - though latch timeout invalidation issues then become the responsibility of the process exclusively, just as is currently the case with the archiver. In other words, they better not register a generic signal handler). It doesn't really matter that the SetLatch() call will usually be unnecessary, because, as Heikki once pointed out, redundantly setting a latch that is already set is very cheap. We don't check if it's set directly in advance of setting the latch, because the Latch struct is "logically opaque" and there is no "public function" to check if it's set, nor should there be; the checking simply happens in SetLatch(). On 18 July 2011 20:06, Heikki Linnakangas wrote: > Right, we can easily change the timeout argument to be in milliseconds > instead of microseconds. I've done so in this latest revision as a precautionary measure. I don't see much point in sub-millisecond granularity, and besides, the Windows implementation will not provide that granularity anyway as things stand. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6a6959f..b2cf973 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10210,7 +10210,7 @@ retry: /* * Wait for more WAL to arrive, or timeout to be reached */ - WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L); + WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000L); ResetLatch(&XLogCtl->recoveryWakeupLatch); } else diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c index 9940a42..0663eb8 100644 --- a/src/backend/port/unix_latch.c +++ b/src/backend/port/unix_latch.c @@ -182,7 +182,7 @@ DisownLatch(volatile Latch *latch) * to wait for. If the latch is already set (and WL_LATCH_SET is given), the * function returns immediately. * - * The 'timeout' is given in microseconds. It must be >= 0 if WL_TIMEOUT + * The 'timeout' is given in milliseconds. It must be >= 0 if WL_TIMEOUT * event is given, otherwise it is ignored. On some platforms, signals cause * the timeout to be restarted, so beware that the function can sleep for * several times longer than the specified timeout. @@ -236,8 +236,9 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, if (wakeEvents & WL_TIMEOUT) { Assert(timeout >= 0); - tv.tv_sec = timeout / 100L; - tv.tv_usec = timeout % 100L; + tv.tv_sec = timeout / 1000L; + /* tv_usec should be in microseconds */ + tv.tv_usec = (timeout % 1000L) * 1000L; tvp = &tv; } @@ -329,6 +330,11 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, /* * Sets a latch and wakes up anyone waiting on it. Returns quickly if the * latch is already set. + * + * Note that there is a general requirement to call this function within + * signal handlers, to ensure that latch timeout is not invalidated. For + * generic signal handlers that may be registered for multiple processes, + * it will be called with the per-process latch in PGPROC as its argument. */ void SetLatch(volatile Latch *latch) diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c index 8d59ad4..5ee4f40 100644 --- a/src/backend/port/win32_latch.c +++ b/src/backend/port/win32_latch.c @@ -99,23 +99,14 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, int numevents; int result = 0; int pmdeath_eventno = 0; - long timeout_ms; Assert(wakeEvents != 0); + Assert(timeout >= 0 || (wakeEvents & WL_TIMEOUT) == 0); /* Ignore WL_SOCKET_* events if no valid socket is given */ if (sock == PGINVALID_SOCKET) wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE); - /* Convert timeout to milliseconds for WaitForMultipleObjects() */ - if (wakeEvents & WL_TIMEOUT) - { - Assert(timeout >= 0); - timeout_ms = timeout / 1000; - } - else - timeout_ms = INFINITE; - /* Construct an array of event handles for WaitforMultipleObjects() */ latch
Re: [HACKERS] cataloguing NOT NULL constraints
Excerpts from Robert Haas's message of sáb jul 23 07:40:12 -0400 2011: > On Sat, Jul 23, 2011 at 4:37 AM, Dean Rasheed > wrote: > > That looks wrong to me, because a NOT NULL constraint is a column > > constraint not a table constraint. The CREATE TABLE syntax explicitly > > distinguishes these 2 cases, and only allows NOT NULLs in column > > constraints. So from a consistency point-of-view, I think that ALTER > > TABLE should follow suit. > > > > So the new syntax could be: > > > > ALTER TABLE table_name ALTER [COLUMN] col_name ADD column_constraint > > > > where column_constraint is the same as in CREATE TABLE (i.e., allowing > > all the other constraint types at the same time). > > > > It looks like that approach would probably lend itself to more > > code-reusability too, especially once we start adding options to the > > constraint. > > So you'd end up with something like this? > > ALTER TABLE foo ALTER COLUMN bar ADD CONSTRAINT somename NOT NULL > > That works for me. I think sticking the name of the constraint in > there at the end of the line as Alvaro proposed would be terrible for > future syntax extensibility - we'll be much less likely to paint > ourselves into a corner with something like this. Here's a patch that does things more or less in this way. Note that this is separate from the other patch, so while you can specify a constraint name for the NOT NULL clause, it's not stored anywhere. This is preliminary: there's no docs nor new tests. Here's how it works (you can also throw in PRIMARY KEY into the mix, but not EXCLUSION): alvherre=# create table bar (a int); CREATE TABLE alvherre=# alter table bar alter column a add constraint foo_fk references foo initially deferred deferrable check (a <> 4) constraint a_uq unique constraint fnn not null; NOTICE: ALTER TABLE / ADD UNIQUE creará el índice implícito «a_uq» para la tabla «bar» ALTER TABLE alvherre=# \d bar Tabla «public.bar» Columna | Tipo | Modificadores -+-+--- a | integer | not null Índices: "a_uq" UNIQUE CONSTRAINT, btree (a) Restricciones CHECK: "bar_a_check" CHECK (a <> 4) Restricciones de llave foránea: "foo_fk" FOREIGN KEY (a) REFERENCES foo(a) DEFERRABLE INITIALLY DEFERRED The implementation is a bit dirty (at least IMO), but I don't see a way around that, mainly because ALTER TABLE / ALTER COLUMN does not have a proper ColumnDef to stick the Constraint nodes into; so while the other constraints can do fine without that, it isn't very helpful for NOT NULL. So it has to create a phony ColumnDef for transformConstraintItems to use. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support alter_add.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RC1 / Beta4?
All, Where are we on RC1 or Beta4 for PostgreSQL 9.1? While I know we're doing going to do a final release in August due to the europeans, it would be nice to move things along before then. There don't seem to be any blockers open. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] USECS_* constants undefined with float8 timestamps?
On Fri, Jul 29, 2011 at 11:18 AM, Johann 'Myrkraverk' Oskarsson wrote: > Hi all, > > I just noticed that the USECS_* constants are not defined when the server > is compiled without integer dates and timestamps. > > Explicitly, timestamp.h is > > #ifdef HAVE_INT64_TIMESTAMP > #define USECS_PER_DAY INT64CONST(864) > #define USECS_PER_HOUR INT64CONST(36) > #define USECS_PER_MINUTE INT64CONST(6000) > #define USECS_PER_SEC INT64CONST(100) > #endif > > Is there a particular reason for this? Even with float8 timestamps > there are uses for these constants in extensions. I don't see any particular reason not define them unconditionally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench internal contention
On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane wrote: > Robert Haas writes: >> On machines with lots of CPU cores, pgbench can start eating up a lot >> of system time. Investigation reveals that the problem is with >> random(), > > Interesting. > >> I patched it to use random_r() - the patch is attached - and here are >> the (rather gratifying) results of that test: >> Since a client-limited benchmark isn't very interesting, I think this >> change makes sense. Thoughts? Objections? > > Portability, or rather lack of it. What about using erand48, which we > already have a dependency on (and substitute code for)? Neither our implementation nor glibc's appears to be thread-safe, and erand48() is deprecated according to my Linux man page: NOTES These functions are declared obsolete by SVID 3, which states that rand(3) should be used instead. glibc provides erand48_r(), and I suppose we could kludge up something similar out of what's already in src/port? This is also not exactly the world's most sophisticated algorithm, but perhaps for pgbench that doesn't matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote: > The thing that was bizarre about the one instance in the buildfarm was > that the error was persistent, ie, once a session had failed all its > subsequent attempts to access pg_class failed too. I gather from Dave's > description that it's working that way for him too. I can think of ways > that there might be a transient race condition against a REINDEX, but > it's very unclear why the failure would persist across multiple > attempts. The best idea I can come up with is that the session has > somehow cached a wrong commit status for the reindexing transaction, > causing it to believe that both old and new copies of the index's > pg_class row are dead ... but how could that happen? The underlying It is definitely persistant. Once triggered the error occurs for any new statement until the session exits. > state in the catalog is not wrong, because no concurrent sessions are > upset (at least not in the buildfarm case ... Dave, do you see more than > one session doing this at a time?). It looks like it happens to multiple sessions so far as one can tell from the timestamps of the errors: timestampsessionid error - -- 03:05:37.434 4e26a861.4a6d could not find pg_class tuple for index 2662 03:05:37.434 4e26a861.4a6f could not find pg_class tuple for index 2662 03:06:12.041 4e26a731.438e could not find pg_class tuple for index 2662 03:06:12.042 4e21b6a3.629b could not find pg_class tuple for index 2662 03:06:12.042 4e26a723.42ec could not find pg_class tuple for index 2662 at character 13 -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench internal contention
Robert Haas writes: > On machines with lots of CPU cores, pgbench can start eating up a lot > of system time. Investigation reveals that the problem is with > random(), Interesting. > I patched it to use random_r() - the patch is attached - and here are > the (rather gratifying) results of that test: > Since a client-limited benchmark isn't very interesting, I think this > change makes sense. Thoughts? Objections? Portability, or rather lack of it. What about using erand48, which we already have a dependency on (and substitute code for)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgbench internal contention
On machines with lots of CPU cores, pgbench can start eating up a lot of system time. Investigation reveals that the problem is with random(), which glibc implements like this: long int __random () { int32_t retval; __libc_lock_lock (lock); (void) __random_r (&unsafe_state, &retval); __libc_lock_unlock (lock); return retval; } weak_alias (__random, random) Rather obviously, if you're running enough pgbench threads, you're going to have a pretty ugly point of contention there. On the 32-core machine provided by Nate Boley, with my usual 5-minute SELECT-only test, lazy-vxid and sinval-fastmessages applied, and scale factor 100, "time" shows that pgbench uses almost as much system time as user time: $ time pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 55319555 tps = 184396.016257 (including connections establishing) tps = 184410.926840 (excluding connections establishing) real5m0.019s user21m10.100s sys 17m45.480s I patched it to use random_r() - the patch is attached - and here are the (rather gratifying) results of that test: $ time ./pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 71851589 tps = 239503.585813 (including connections establishing) tps = 239521.816698 (excluding connections establishing) real5m0.016s user20m40.880s sys 9m25.930s Since a client-limited benchmark isn't very interesting, I think this change makes sense. Thoughts? Objections? Coding style improvements? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company random_r.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI error messages
Alvaro Herrera writes: > Excerpts from Peter Eisentraut's message of vie jul 29 14:46:20 -0400 2011: >> On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote: >> Do you have an idea how to address this case: > Call sprintf to expand the %u before ereport()? That sounds like way too much work for a wording "improvement" that is exceedingly marginal in the first place. IMO the only thing really wrong with those messages is that they aren't complete sentences, which is a style guideline that we ought to try to adhere to even if we're not bothering to translate them. Perhaps "Transaction canceled because of ..." would read better than "Canceled on ..."? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI error messages
Excerpts from Peter Eisentraut's message of vie jul 29 14:46:20 -0400 2011: > On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote: > > I think I would prefer something like this: > > > > ERROR: could not serialize access due to read/write dependencies > > among > > transactions > > DETAIL: Reason code: %s > > HINT: The transaction might succeed if retried. > > > > Where %s gets the current detail field, untranslated, like: > > > > Canceled on commit attempt with conflict in from prepared pivot. > > Do you have an idea how to address this case: Call sprintf to expand the %u before ereport()? > @@ -3865,7 +3865,7 @@ CheckForSerializableConflictOut(bool visible, Relation > relation, > ereport(ERROR, > (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), > errmsg("could not serialize access due to read/write > dependencies among transactions"), > -errdetail_internal("Canceled on conflict out to old > pivot %u.", xid), > +errdetail("Reason code: %s.", "canceled on conflict > out to old pivot %u", xid), // XXX bogus > errhint("The transaction might succeed if retried."))); -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
Excerpts from Nikhil Sontakke's message of vie jul 29 14:12:37 -0400 2011: > Hi all, > > PFA, patch which implements non-inheritable "ONLY" constraints. This > has been achieved by introducing a new column "conisonly" in > pg_constraint catalog. Specification of 'ONLY' in the ALTER TABLE ADD > CONSTRAINT CHECK command is used to set this new column to true. > Constraints which have this column set to true cannot be inherited by > present and future children ever. > > The psql and pg_dump binaries have been modified to account for such > persistent non-inheritable check constraints. This patch also has > documentation changes along with relevant changes to the test cases. > The regression runs pass fine with this patch applied. > > Comments and further feedback, if any, appreciated. Did you look at how this conflicts with my patch to add not null rows to pg_constraint? https://commitfest.postgresql.org/action/patch_view?id=601 -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incremental checkopints
On 07/29/2011 11:04 AM, jord...@go-link.net wrote: I think that current implementation of checkpoints is not good for huge shared buffer cache and for many WAL segments. If there is more buffers and if buffers can be written rarely more updates of buffers can be combined so total number of writes to disk will be significantly less. I think that incremental checkpoints can achieve this goal (maybe more) and price is additional memory (about 1/1000 of size of buffer cache). The current code optimizes for buffers that are written frequently. Those will sit in shared_buffers and in the hoped for case, only be written once at checkpoint time. There are two issues with adopting increment checkpoints instead, one fundamental, the other solvable but not started on yet: 1) Postponing writes as long as possible always improves the resulting throughput of those writes. Any incremental checkpoint approach will detune throughput by some amount. If you make writes go out more often, they will be less efficient; that's just how things work if you benchmark anything that allows write combining. Any incremental checkpoint approach is likely to improve latency in some cases if it works well, while decreasing throughput in most cases. 2) The incremental checkpoint approach used by other databases, such as the MySQL implementation, works by tracking what transaction IDs were associated with a buffer update. The current way PostgreSQL saves buffer sync information for the checkpoint to process things doesn't store enough information to do that. As you say, the main price there is some additional memory. From my perspective, the main problem with plans to tweak the checkpoint code is that we don't have a really good benchmark that tracks both throughput and latency to test proposed changes against. Mark Wong has been working to get his TCP-E clone DBT-5 running regularly for that purpose, and last I heard that was basically done at this point--he's running daily tests now. There's already a small pile of patches that adjust checkpoint behavior around that were postponed from being included in 9.1 mainly because it was hard to prove they were useful given the benchmark used to test them, pgbench. I have higher hopes for DBT-5 as being a test that gives informative data in this area. I would want to go back and revisit the existing patches (sorted checkpoints, spread sync) before launching into this whole new area. I don't think any of those has even been proven not to work, they just didn't help the slightly unrealistic pgbench write-heavy workload. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] include host names in hba error messages
On Fri, Jul 29, 2011 at 2:44 PM, Peter Eisentraut wrote: > On tis, 2011-07-19 at 14:17 -0400, Robert Haas wrote: >> I think it would be less confusing to write the IP address as the main >> piece of information, and put the hostname in parentheses only if we >> accepted it as valid (i.e. we did both lookups, and everything >> matched). >> >> ERROR: no pg_hba.conf entry for host 127.0.0.1 ("localhost"), user >> "x", database "y" >> >> As for the case where we the forward lookup and reverse lookup don't >> match, could we add that as a DETAIL? >> >> ERROR: no pg_hba.conf entry for host 127.0.0.1, user "x", database "y" >> DETAIL: Forward and reverse DNS lookups do not match. > > On further reflection, the only way we would get a complete match host > name is if there actually were a line in pg_hba.conf with that host > name, but it didn't match because of other parameters. So that would be > quite rare, and so the error message would look one way or the other > depending on obscure circumstances, which would be confusing. > > But picking up on your second suggestion, I propose instead that we put > a note in the detail about the host name and what we know about it, if > we know it, e.g. > > ERROR: no pg_hba.conf entry for host 127.0.0.1, user "x", database "y" > DETAIL: Client IP address resolved to "localhost", forward lookup matches. > > I chose to use errdetail_log(), which only goes into the server log, so > we don't expose too much about the server's DNS setup to the client. Seems reasonable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI error messages
On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote: > I think I would prefer something like this: > > ERROR: could not serialize access due to read/write dependencies > among > transactions > DETAIL: Reason code: %s > HINT: The transaction might succeed if retried. > > Where %s gets the current detail field, untranslated, like: > > Canceled on commit attempt with conflict in from prepared pivot. Do you have an idea how to address this case: @@ -3865,7 +3865,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation, ereport(ERROR, (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), errmsg("could not serialize access due to read/write dependencies among transactions"), -errdetail_internal("Canceled on conflict out to old pivot %u.", xid), +errdetail("Reason code: %s.", "canceled on conflict out to old pivot %u", xid), // XXX bogus errhint("The transaction might succeed if retried."))); if (SxactHasSummaryConflictIn(MySerializableXact) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] include host names in hba error messages
On tis, 2011-07-19 at 14:17 -0400, Robert Haas wrote: > I think it would be less confusing to write the IP address as the main > piece of information, and put the hostname in parentheses only if we > accepted it as valid (i.e. we did both lookups, and everything > matched). > > ERROR: no pg_hba.conf entry for host 127.0.0.1 ("localhost"), user > "x", database "y" > > As for the case where we the forward lookup and reverse lookup don't > match, could we add that as a DETAIL? > > ERROR: no pg_hba.conf entry for host 127.0.0.1, user "x", database "y" > DETAIL: Forward and reverse DNS lookups do not match. On further reflection, the only way we would get a complete match host name is if there actually were a line in pg_hba.conf with that host name, but it didn't match because of other parameters. So that would be quite rare, and so the error message would look one way or the other depending on obscure circumstances, which would be confusing. But picking up on your second suggestion, I propose instead that we put a note in the detail about the host name and what we know about it, if we know it, e.g. ERROR: no pg_hba.conf entry for host 127.0.0.1, user "x", database "y" DETAIL: Client IP address resolved to "localhost", forward lookup matches. I chose to use errdetail_log(), which only goes into the server log, so we don't expose too much about the server's DNS setup to the client. diff --git i/src/backend/libpq/auth.c w/src/backend/libpq/auth.c index d153880..1b6399d 100644 --- i/src/backend/libpq/auth.c +++ w/src/backend/libpq/auth.c @@ -439,6 +439,17 @@ ClientAuthentication(Port *port) NULL, 0, NI_NUMERICHOST); +#define HOSTNAME_LOOKUP_DETAIL(port) \ +(port->remote_hostname \ + ? (port->remote_hostname_resolv == +1 \ + ? errdetail_log("Client IP address resolved to \"%s\", forward lookup matches.", port->remote_hostname) \ + : (port->remote_hostname_resolv == 0\ + ? errdetail_log("Client IP address resolved to \"%s\", forward lookup not checked.", port->remote_hostname) \ + : (port->remote_hostname_resolv == -1 \ + ? errdetail_log("Client IP address resolved to \"%s\", forward lookup does not match.", port->remote_hostname) \ + : 0))) \ + : 0) + if (am_walsender) { #ifdef USE_SSL @@ -446,12 +457,14 @@ ClientAuthentication(Port *port) (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), errmsg("no pg_hba.conf entry for replication connection from host \"%s\", user \"%s\", %s", hostinfo, port->user_name, - port->ssl ? _("SSL on") : _("SSL off"; + port->ssl ? _("SSL on") : _("SSL off")), + HOSTNAME_LOOKUP_DETAIL(port))); #else ereport(FATAL, (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), errmsg("no pg_hba.conf entry for replication connection from host \"%s\", user \"%s\"", - hostinfo, port->user_name))); + hostinfo, port->user_name), + HOSTNAME_LOOKUP_DETAIL(port))); #endif } else @@ -462,13 +475,15 @@ ClientAuthentication(Port *port) errmsg("no pg_hba.conf entry for host \"%s\", user \"%s\", database \"%s\", %s", hostinfo, port->user_name, port->database_name, - port->ssl ? _("SSL on") : _("SSL off"; + port->ssl ? _("SSL on") : _("SSL off")), + HOSTNAME_LOOKUP_DETAIL(port))); #else ereport(FATAL, (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), errmsg("no pg_hba.conf entry for host \"%s\", user \"%s\", database \"%s\"", hostinfo, port->user_name, - port->database_name))); + port->database_name), + HOSTNAME_LOOKUP_DETAIL(port))); #endif } break; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Common object property boards
On Fri, Jul 29, 2011 at 1:49 PM, Tom Lane wrote: > It would likely be better to not expose the struct type, just individual > lookup functions. I'm not sure about that... I think that's just going to introduce a lot of excess notation. >> And, a translation from ObjectType to type name (e.g "table", "type", ...) >> is helpful to generate error messages. > >> const char *get_object_type_name(ObjectType objtype); > > Again, I think this is too low level because of message translation > considerations. On this point I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
Hi all, PFA, patch which implements non-inheritable "ONLY" constraints. This has been achieved by introducing a new column "conisonly" in pg_constraint catalog. Specification of 'ONLY' in the ALTER TABLE ADD CONSTRAINT CHECK command is used to set this new column to true. Constraints which have this column set to true cannot be inherited by present and future children ever. The psql and pg_dump binaries have been modified to account for such persistent non-inheritable check constraints. This patch also has documentation changes along with relevant changes to the test cases. The regression runs pass fine with this patch applied. Comments and further feedback, if any, appreciated. Regards, Nikhils diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 5e5f8a7..683ad67 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1995,6 +1995,16 @@ + conisonly + bool + + + This constraint is defined locally for the relation. It is a + non-inheritable constraint. + + + + conkey int2[] pg_attribute.attnum diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 4c2a4cd..3ee3ec0 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -984,6 +984,14 @@ ALTER TABLE distributors ADD CONSTRAINT zipchk CHECK (char_length(zipcode) = 5); + To add a check constraint only to a table and not to its children: + +ALTER TABLE ONLY distributors ADD CONSTRAINT zipchk CHECK (char_length(zipcode) = 5); + + (The check constraint will not be inherited by future children too.) + + + To remove a check constraint from a table and all its children: ALTER TABLE distributors DROP CONSTRAINT zipchk; diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4399493..1b382b8 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -98,10 +98,10 @@ static Oid AddNewRelationType(const char *typeName, Oid new_array_type); static void RelationRemoveInheritance(Oid relid); static void StoreRelCheck(Relation rel, char *ccname, Node *expr, - bool is_validated, bool is_local, int inhcount); + bool is_validated, bool is_local, int inhcount, bool is_only); static void StoreConstraints(Relation rel, List *cooked_constraints); static bool MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr, - bool allow_merge, bool is_local); + bool allow_merge, bool is_local, bool is_only); static void SetRelationNumChecks(Relation rel, int numchecks); static Node *cookConstraint(ParseState *pstate, Node *raw_constraint, @@ -1860,7 +1860,7 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, Node *expr) */ static void StoreRelCheck(Relation rel, char *ccname, Node *expr, - bool is_validated, bool is_local, int inhcount) + bool is_validated, bool is_local, int inhcount, bool is_only) { char *ccbin; char *ccsrc; @@ -1943,7 +1943,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr, ccbin,/* Binary form of check constraint */ ccsrc,/* Source form of check constraint */ is_local, /* conislocal */ - inhcount);/* coninhcount */ + inhcount, /* coninhcount */ + is_only); /* conisonly */ pfree(ccbin); pfree(ccsrc); @@ -1984,7 +1985,7 @@ StoreConstraints(Relation rel, List *cooked_constraints) break; case CONSTR_CHECK: StoreRelCheck(rel, con->name, con->expr, !con->skip_validation, - con->is_local, con->inhcount); + con->is_local, con->inhcount, con->is_only); numchecks++; break; default: @@ -2100,6 +2101,7 @@ AddRelationNewConstraints(Relation rel, cooked->skip_validation = false; cooked->is_local = is_local; cooked->inhcount = is_local ? 0 : 1; + cooked->is_only = false; cookedConstraints = lappend(cookedConstraints, cooked); } @@ -2167,7 +2169,7 @@ AddRelationNewConstraints(Relation rel, * what ATAddCheckConstraint wants.)
Re: [HACKERS] [RFC] Common object property boards
Kohei Kaigai writes: > In addition to this suggestion, I think the big static array also contains > the following items: > - Text form of the object type (e.g, "table", "function", ...) What will you do with that that wouldn't be better done by calling getObjectDescription? The latter's output is somewhat localizable, but individual words would be hard to translate. > Does the main lookup function ought to return an entry of the big array? > If so, the definition of structure should be declared in objectaddress.h, > as follows: It would likely be better to not expose the struct type, just individual lookup functions. > And, a translation from ObjectType to type name (e.g "table", "type", ...) > is helpful to generate error messages. > const char *get_object_type_name(ObjectType objtype); Again, I think this is too low level because of message translation considerations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: OT: OFF TOPIC: returning multiple result sets from a stored procedure
>I honestly do not mean any offence, just out of curiosity. >If you guys care about money and time why would you spend the best years of >your life basically copying commercial products for free? 1) For the same reasons commercial vendors build competing products: different tools in the same category fit different scenarios 2) The right-tool-for-the-job concept: Sometimes a commercial solution is the right choice, sometime an open-source solution is. Budgets, applicability, and local expertise all should factor into platform selection decision. 3) Because sometimes the output of a platform is more important than the platform itself: If your role is to work with data, not to produce software, it can in the end be more efficient in the end to have a platform you can modify as needed.
[HACKERS] [RFC] Common object property boards
Robert Haas wrote: | I think that get_object_namespace() needs to be rethought. If you | take a look at AlterObjectNamespace() and its callers, you'll notice | that we already have, encoded in those call sites, the knowledge of | which object types can be looked up in which system caches, and which | columns within the resulting heap tuples will contain the schema OIDs. | Here, we're essentially duplicating that information in another | place, which doesn't seem good. I think perhaps we should create a | big static array, each row of which would contain: | | - ObjectType | - system cache ID for OID lookups | - system catalog table OID for scans | - attribute number for the name attribute, where applicable (see | AlterObjectNamespace) | - attribute number for the namespace attribute | - attribute number for the owner attribute | - ...and maybe some other properties | | We could then create a function (in objectaddress.c, probably) that | you call with an ObjectType argument, and it returns a pointer to the | appropriate struct entry, or calls elog() if none is found. This | function could be used by the existing object_exists() functions in | lieu of having its own giant switch statement, by | AlterObjectNamespace(), and by this new consolidated DROP code. If | you agree with this approach, we should probably make it a separate | patch. | According to the suggestion from Robert, I'd like to see opinions about this feature prior to its implementation. In addition to this suggestion, I think the big static array also contains the following items: - Text form of the object type (e.g, "table", "function", ...) - Function pointer of validator. It shall raise an error when relkind is not RELKIND_RELATION for ObjectType being OBJECT_TABLE, for example. Does the main lookup function ought to return an entry of the big array? If so, the definition of structure should be declared in objectaddress.h, as follows: ObjectProperty get_object_property(ObjectType objtype); In addition to the big array indexed by ObjectType, I'd like to have a utility function that translate ObjectAddress to ObjectType. (A part of them need syscache lookup, because pg_class contains several objcts types: table, view, ...) ObjectType get_object_type(const ObjectAddress *address); If and when we have both of the function to reference ObjectProperty and to translate ObjectAddress to ObjectType, it is quite easy to implement service routines to lookup namespaceId, ownerId and others for the supplied ObjectAddress, as follows: Oid get_object_namespace(const ObjectAddress *address); Oid get_object_ownership(const ObjectAddress *address); char *get_object_name(const ObjectAddress *address); HeapTuple get_object_tuple(const ObjectAddress *address); And, a translation from ObjectType to type name (e.g "table", "type", ...) is helpful to generate error messages. const char *get_object_type_name(ObjectType objtype); Any comments please. Thanks, -- NEC Europe Ltd, SAP Global Competence Center KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
Robert Haas writes: > On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote: >> Well, no, because the ScanPgRelation call is not failing internally. >> It's performing a seqscan of pg_class and not finding a matching tuple. > SnapshotNow race? That's what I would have guessed to start with, except that then it wouldn't be a persistent failure. You'd get one, but then the next query would succeed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: >>> The thing that was bizarre about the one instance in the buildfarm was >>> that the error was persistent, ie, once a session had failed all its >>> subsequent attempts to access pg_class failed too. > >> I was thinking more along the lines of a failure while processing a >> sinval message emitted by the REINDEX. The sinval message doesn't get >> fully processed and therefore we get confused about what the >> relfilenode is for pg_class. If that happened for any other relation, >> we could recover by scanning pg_class. But if it happens for pg_class >> or pg_class_oid_index, we're toast, because we can't scan them without >> knowing what relfilenode to open. > > Well, no, because the ScanPgRelation call is not failing internally. > It's performing a seqscan of pg_class and not finding a matching tuple. SnapshotNow race? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
Robert Haas writes: > On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: >> The thing that was bizarre about the one instance in the buildfarm was >> that the error was persistent, ie, once a session had failed all its >> subsequent attempts to access pg_class failed too. > I was thinking more along the lines of a failure while processing a > sinval message emitted by the REINDEX. The sinval message doesn't get > fully processed and therefore we get confused about what the > relfilenode is for pg_class. If that happened for any other relation, > we could recover by scanning pg_class. But if it happens for pg_class > or pg_class_oid_index, we're toast, because we can't scan them without > knowing what relfilenode to open. Well, no, because the ScanPgRelation call is not failing internally. It's performing a seqscan of pg_class and not finding a matching tuple. You could hypothesize about maybe an sinval message got missed leading us to scan the old (pre-VAC-FULL) copy of pg_class, but that still wouldn't explain how come we can't find a valid-looking entry for pg_class_oid_index in it. Tis a puzzlement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] USECS_* constants undefined with float8 timestamps?
Hi all, I just noticed that the USECS_* constants are not defined when the server is compiled without integer dates and timestamps. Explicitly, timestamp.h is #ifdef HAVE_INT64_TIMESTAMP #define USECS_PER_DAY INT64CONST(864) #define USECS_PER_HOUR INT64CONST(36) #define USECS_PER_MINUTE INT64CONST(6000) #define USECS_PER_SEC INT64CONST(100) #endif Is there a particular reason for this? Even with float8 timestamps there are uses for these constants in extensions. -- Johann Oskarssonhttp://www.2ndquadrant.com/|[] PostgreSQL Development, 24x7 Support, Training and Services --+-- | Blog: http://my.opera.com/myrkraverk/blog/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: > daveg writes: >> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: >>> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which >>> means that a REINDEX is happening as part of the same operation. In >>> 9.0, there's no point in doing VACUUM FULL immediately followed by >>> REINDEX. My guess is that this is happening either right around the >>> time the VACUUM FULL commits or right around the time the REINDEX >>> commits. It'd be helpful to know which, if you can figure it out. > >> I'll update my vacuum script to skip reindexes after vacuum full for 9.0 >> servers and see if that makes the problem go away. > > The thing that was bizarre about the one instance in the buildfarm was > that the error was persistent, ie, once a session had failed all its > subsequent attempts to access pg_class failed too. I gather from Dave's > description that it's working that way for him too. I can think of ways > that there might be a transient race condition against a REINDEX, but > it's very unclear why the failure would persist across multiple > attempts. The best idea I can come up with is that the session has > somehow cached a wrong commit status for the reindexing transaction, > causing it to believe that both old and new copies of the index's > pg_class row are dead ... but how could that happen? The underlying > state in the catalog is not wrong, because no concurrent sessions are > upset (at least not in the buildfarm case ... Dave, do you see more than > one session doing this at a time?). I was thinking more along the lines of a failure while processing a sinval message emitted by the REINDEX. The sinval message doesn't get fully processed and therefore we get confused about what the relfilenode is for pg_class. If that happened for any other relation, we could recover by scanning pg_class. But if it happens for pg_class or pg_class_oid_index, we're toast, because we can't scan them without knowing what relfilenode to open. Now that can't be quite right, because of course those are mapped relations... and I'm pretty sure there are some other holes in my line of reasoning too. Just thinking out loud... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots
On Fri, 2011-07-29 at 10:23 -0400, Robert Haas wrote: > On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing wrote: > >> An additional point to think about: if we were willing to insist on > >> streaming replication, we could send the commit sequence numbers via a > >> side channel rather than writing them to WAL, which would be a lot > >> cheaper. > > > > Why do you think that side channel is cheaper than main WAL ? > > You don't have to flush it to disk, You can probably write the "i became visible" WAL record without forcing a flush and still get the same visibility order. > and you can use some other lock > that isn't as highly contended as WALInsertLock to synchronize it. but you will need to synchronise it with WAL replay on slave anyway. It seems easiest to just insert it in the WAL stream and be done with it. > >> That might even be a reasonable thing to do, because if > >> you're doing log shipping, this is all going to be super-not-real-time > >> anyway. > > > > But perhaps you still may want to preserve visibility order to be able > > to do PITR to exact transaction "commit", no ? > > Maybe. In practice, I suspect most people won't be willing to pay the > price a feature like this would exact. Unless we find some really bad problems with different visibility orders on master and slave(s) you are probably right. -- --- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots
On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing wrote: >> An additional point to think about: if we were willing to insist on >> streaming replication, we could send the commit sequence numbers via a >> side channel rather than writing them to WAL, which would be a lot >> cheaper. > > Why do you think that side channel is cheaper than main WAL ? You don't have to flush it to disk, and you can use some other lock that isn't as highly contended as WALInsertLock to synchronize it. >> That might even be a reasonable thing to do, because if >> you're doing log shipping, this is all going to be super-not-real-time >> anyway. > > But perhaps you still may want to preserve visibility order to be able > to do PITR to exact transaction "commit", no ? Maybe. In practice, I suspect most people won't be willing to pay the price a feature like this would exact. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots
On Thu, 2011-07-28 at 20:14 -0400, Robert Haas wrote: > On Thu, Jul 28, 2011 at 7:54 PM, Ants Aasma wrote: > > On Thu, Jul 28, 2011 at 11:54 PM, Kevin Grittner > > wrote: > >> (4) We communicate acceptable snapshots to the replica to make the > >> order of visibility visibility match the master even when that > >> doesn't match the order that transactions returned from commit. > > > > I wonder if some interpretation of 2 phase commit could make Robert's > > original suggestion implement this. > > > > On the master the commit sequence would look something like: > > 1. Insert commit record to the WAL > > 2. Wait for replication > > 3. Get a commit seq nr and mark XIDs visible > > 4. WAL log the seq nr > > 5. Return success to client > > > > When replaying: > > * When replaying commit record, do everything but make > > the tx visible. > > * When replaying the commit sequence number > >if there is a gap between last visible commit and current: > > insert the commit sequence nr. to list of waiting commits. > >else: > > mark current and all directly following waiting tx's visible > > > > This would give consistent visibility order on master and slave. Robert > > is right that this would undesirably increase WAL traffic. Delaying this > > traffic would undesirably increase replay lag between master and slave. > > But it seems to me that this could be an optional WAL level on top of > > hot_standby that would only be enabled if consistent visibility on > > slaves is desired. > > I think you nailed it. Agreed, this would keep current semantics on master and same visibility order on master and slave. > An additional point to think about: if we were willing to insist on > streaming replication, we could send the commit sequence numbers via a > side channel rather than writing them to WAL, which would be a lot > cheaper. Why do you think that side channel is cheaper than main WAL ? How would you handle synchronising the two ? > That might even be a reasonable thing to do, because if > you're doing log shipping, this is all going to be super-not-real-time > anyway. But perhaps you still may want to preserve visibility order to be able to do PITR to exact transaction "commit", no ? > OTOH, I know we don't want to make WAL shipping anything less > than a first class citizen, so maybe not. > > At any rate, we may be getting a little sidetracked here from the > original point of the thread, which was how to make snapshot-taking > cheaper. Maybe there's some tie-in to when transactions become > visible, but I think it's pretty weak. The existing system could be > hacked up to avoid making transactions visible out of LSN order, and > the system I proposed could make them visible either in LSN order or > do the same thing we do now. They are basically independent problems, > AFAICS. Agreed. -- --- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per-column FDW options, v5
2011/7/29 Shigeru Hanada : > Here is a rebased version of per-column FDW options patch. I've > proposed this patch in last CF, but it was marked as returned with > feedback. So I would like to propose in next CF 2011-09. I already > moved CF item into new topic "SQL/MED" of CF 2011-09. I did a quick review of this patch and it looks good to me, modulo some quibbles with the wording of the documentation and comments and a couple of minor stylistic nitpicks. Barring objections, I'll fix this up a bit and commit it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incremental checkopints
Hi, I have read all information about checkpoints in PostgreSQL I have found. I think that current implementation of checkpoints is not good for huge shared buffer cache and for many WAL segments. If there is more buffers and if buffers can be written rarely more updates of buffers can be combined so total number of writes to disk will be significantly less. I think that incremental checkpoints can achieve this goal (maybe more) and price is additional memory (about 1/1000 of size of buffer cache). My main source of information is http://wiki.postgresql.org/wiki/User:Gsmith#How_do_checkpoints_happen_inside_the_PostgreSQL_backend.3F I see that some data are required to be written into WAL in 3) and 6). I will use CD to denote that data and P1, P2... to denote pages that are dirty and has to be written to disk in 4). In incremental checkpoint when WAL segment has written we will not start writing but we will add to queue pages P1, P2 ... and CD. If meanwhile background writer has to clean some page that page is removed from queue. When checkpoint_segments are written in the transaction log we have in queue: P1, P2 ... CD, Pi ... CD, Pj ... CD ... Here we have to make checkpoint in order to free first WAL segment. Only pages before first CD have to be written and fsync’d. I suppose that this task can be done in background writer. So first we can make some number of writes per round both lru and checkpoint. There is no deadline for each incremental checkpoint but if WAL is growing total number of writes have to increase. Also it is not required to do checkpoint for each WAL segment. It is possible to write N pages from queue and to combine several potential checkpoint in one. I hope I have explained the general idea. I am not C programmer so it is hard to me to give more details. Jordan Ivanov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
> We could imagine doing something like CHECK ONLY (foo), but that seems > quite non-orthogonal with (a) everything else in CREATE TABLE, and > (b) ALTER TABLE ONLY. > > Yeah, I thought about CHECK ONLY support as part of table definition, but as you say - it appears to be too non-standard right now and we can always go back to this later if the need be felt. Regards, Nikhils
Re: [HACKERS] error: could not find pg_class tuple for index 2662
daveg writes: > On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: >> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which >> means that a REINDEX is happening as part of the same operation. In >> 9.0, there's no point in doing VACUUM FULL immediately followed by >> REINDEX. My guess is that this is happening either right around the >> time the VACUUM FULL commits or right around the time the REINDEX >> commits. It'd be helpful to know which, if you can figure it out. > I'll update my vacuum script to skip reindexes after vacuum full for 9.0 > servers and see if that makes the problem go away. The thing that was bizarre about the one instance in the buildfarm was that the error was persistent, ie, once a session had failed all its subsequent attempts to access pg_class failed too. I gather from Dave's description that it's working that way for him too. I can think of ways that there might be a transient race condition against a REINDEX, but it's very unclear why the failure would persist across multiple attempts. The best idea I can come up with is that the session has somehow cached a wrong commit status for the reindexing transaction, causing it to believe that both old and new copies of the index's pg_class row are dead ... but how could that happen? The underlying state in the catalog is not wrong, because no concurrent sessions are upset (at least not in the buildfarm case ... Dave, do you see more than one session doing this at a time?). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots
Robert Haas wrote: >> (4) We communicate acceptable snapshots to the replica to make >> the order of visibility visibility match the master even when >> that doesn't match the order that transactions returned from >> commit. >> I (predictably) like (4) -- even though it's a lot of work > > I think that (4), beyond being a lot of work, will also have > pretty terrible performance. You're basically talking about > emitting two WAL records for every commit instead of one. Well, I can think of a great many other ways this could be done, each with its own trade-offs of various types of overhead against how close the replica is to current. At one extreme you could do what you describe, at the other you could generate a new snapshot on the replica once every few minutes. Then there are more clever ways, in discussions a few months ago I suggested that adding two new bit flags to the commit record would suffice, and I don't remember anyone blowing holes in that idea. Of course, that was to achieve serializable behavior on the replica, based on some assumption that the current hot standby already supported repeatable read. We might need another bit or two to solve the problems with that which have surfaced on this thread. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
> > Yeah, I have already hacked it a bit. This constraint now needs to be > > spit out later as an ALTER command with ONLY attached to it > > appropriately. Earlier all CHECK constraints were generally emitted as > > part of the table definition itself. > > IIRC, there's already support for splitting out a constraint that way, > in order to deal with circular dependencies. You just need to treat > this as an additional reason for splitting. > > Yeah, I have indeed followed the existing separate printing logic for "ONLY" constraints. Had to make the table dependent on this constraint to print the constraint *after* the table definition. Regards, Nikhils
Re: [HACKERS] Check constraints on partition parents only?
> > Yeah, I have already hacked it a bit. This constraint now needs to be > > spit out later as an ALTER command with ONLY attached to it > > appropriately. Earlier all CHECK constraints were generally emitted as > > part of the table definition itself. > > Hrm. That doesn't seem so good. Maybe we've got the design wrong > here. It doesn't seem like we want to lose the ability to define > arbitrary constraints at table-creation time. > > Well the handling is different now for "ONLY" constraints only. The normal constraints can still be attached at table-creation time. Regards, Nikhils
Re: [HACKERS] Check constraints on partition parents only?
Robert Haas writes: > On Fri, Jul 29, 2011 at 8:30 AM, Nikhil Sontakke wrote: >> Yeah, I have already hacked it a bit. This constraint now needs to be >> spit out later as an ALTER command with ONLY attached to it >> appropriately. Earlier all CHECK constraints were generally emitted as >> part of the table definition itself. > Hrm. That doesn't seem so good. Maybe we've got the design wrong > here. It doesn't seem like we want to lose the ability to define > arbitrary constraints at table-creation time. Well, you can't define arbitrary indexes within the CREATE TABLE syntax, either. This does not bother me a lot. We could imagine doing something like CHECK ONLY (foo), but that seems quite non-orthogonal with (a) everything else in CREATE TABLE, and (b) ALTER TABLE ONLY. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
Nikhil Sontakke writes: >> (Also, don't forget you need to hack pg_dump, too.) > Yeah, I have already hacked it a bit. This constraint now needs to be > spit out later as an ALTER command with ONLY attached to it > appropriately. Earlier all CHECK constraints were generally emitted as > part of the table definition itself. IIRC, there's already support for splitting out a constraint that way, in order to deal with circular dependencies. You just need to treat this as an additional reason for splitting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On Fri, Jul 29, 2011 at 8:30 AM, Nikhil Sontakke wrote: > Yeah, I have already hacked it a bit. This constraint now needs to be > spit out later as an ALTER command with ONLY attached to it > appropriately. Earlier all CHECK constraints were generally emitted as > part of the table definition itself. Hrm. That doesn't seem so good. Maybe we've got the design wrong here. It doesn't seem like we want to lose the ability to define arbitrary constraints at table-creation time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
> psql=# \d a > Table "public.a" > Column | Type | Modifiers > +-+--- > b | integer | > Check constraints: >"achk" CHECK (false) >"bchk" CHECK (b > 0) > > Is this acceptable? Or we need to put in work into psql to show ONLY > somewhere in the description? If yes, ONLY CHECK sounds weird, maybe > we should use LOCAL CHECK or some such mention: > > Check constraints: >"achk" LOCAL CHECK (false) >"bchk" CHECK (b > 0) I think you need to stick with "ONLY". Using two different words is just going to create confusion. You could fool around with where exactly you put it on the line, but switching to a different word seems like not a good idea. Ok, maybe something like: "achk" (ONLY) CHECK (false) >>(Also, don't forget you need to hack pg_dump, too.) Yeah, I have already hacked it a bit. This constraint now needs to be spit out later as an ALTER command with ONLY attached to it appropriately. Earlier all CHECK constraints were generally emitted as part of the table definition itself. Regards, Nikhils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On Fri, Jul 29, 2011 at 7:41 AM, Nikhil Sontakke wrote: > Hi, > >>>Any preferences for the name? >>> connoinh >>> conisonly >>> constatic or confixed >> >> I'd probably pick conisonly from those choices. >> > > The use of "\d" inside psql will show ONLY constraints without any > embellishments similar to normal constraints. E.g. > > > ALTER TABLE ONLY a ADD CONSTRAINT achk CHECK (FALSE); > > ALTER TABLE a ADD CONSTRAINT bchk CHECK (b > 0); > > psql=# \d a > Table "public.a" > Column | Type | Modifiers > +-+--- > b | integer | > Check constraints: > "achk" CHECK (false) > "bchk" CHECK (b > 0) > > Is this acceptable? Or we need to put in work into psql to show ONLY > somewhere in the description? If yes, ONLY CHECK sounds weird, maybe > we should use LOCAL CHECK or some such mention: > > Check constraints: > "achk" LOCAL CHECK (false) > "bchk" CHECK (b > 0) I think you need to stick with "ONLY". Using two different words is just going to create confusion. You could fool around with where exactly you put it on the line, but switching to a different word seems like not a good idea. (Also, don't forget you need to hack pg_dump, too.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
Hi, >>Any preferences for the name? >> connoinh >> conisonly >> constatic or confixed > > I'd probably pick conisonly from those choices. > The use of "\d" inside psql will show ONLY constraints without any embellishments similar to normal constraints. E.g. ALTER TABLE ONLY a ADD CONSTRAINT achk CHECK (FALSE); ALTER TABLE a ADD CONSTRAINT bchk CHECK (b > 0); psql=# \d a Table "public.a" Column | Type | Modifiers +-+--- b | integer | Check constraints: "achk" CHECK (false) "bchk" CHECK (b > 0) Is this acceptable? Or we need to put in work into psql to show ONLY somewhere in the description? If yes, ONLY CHECK sounds weird, maybe we should use LOCAL CHECK or some such mention: Check constraints: "achk" LOCAL CHECK (false) "bchk" CHECK (b > 0) Regards, Nikhils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XMLATTRIBUTES vs. values of type XML
On Jul28, 2011, at 22:51 , Peter Eisentraut wrote: > On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote: >> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote: >>> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES >>> would behave as in >>> >>> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML >>> AS text) AS "a")) >> >> With both 9.1 and 9.2 this query returns >> >> xmlelement >> >> >> >> i.e. makes the value of "a" represent the *literal* string '&', *not* >> the literal string '&'. Just to be sure there's no miss-understanding here >> - is this what you expect? > > Well, I expect it to fail. Now you've lost me. What exactly should fail under what circumstances? >> What's the workaround if you have a value of type XML, e.g. '&', >> and want to set an attribute to the value represented by that XML fragment >> (i.e. '&')? Since we have no XMLUNESCAPE function, I don't see an easy >> way to do that. Maybe I'm missing something, though. > > It may be worth researching whether the XMLSERIALIZE function is > actually doing what it should, or whether it could/should do some > unescaping. I don't see how that could work. It can't unescape anything in e.g. '&', because the result would be a quite useless not-really-XML kind of thing. It could unescape '&' but that kind of content-dependent behaviour seem even worse than my proposed escaping rules for XMLATTRIBUTES. > Unfortunately, in the latest SQL/XML standard the final > answer it nested deep in the three other standards, so I don't have an > answer right now. But there are plenty of standards in this area, so > I'd hope that one of them can give us the right behavior, instead of us > making something up. Which standards to you have in mind there? If you can point me to a place where I can obtain them, I could check if there's something in them which helps. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers