Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Feb 4, 2016 at 5:23 PM, Stas Kelvichwrote: (Please do not top-post, this breaks the thread flow.) > I’ve looked over proposed patch and migrated my shell tests scripts that i’ve > used for testing twophase commits on master/slave to this test framework. > Everything looks mature, and I didn’t encountered any problems with writing > new tests using this infrastructure. > From my point of view I don’t see any problems to commit this patches in > their current state. Thanks for the review! > 0) There are several routines that does actual checking, like > is/command_ok/command_fails. I think it will be very handy to have wrappers > psql_ok/psql_fails that calls psql through the command_ok/fails. Do you have a test case in mind for it? > 1) Better to raise more meaningful error when IPC::Run is absent. This has been discussed before, and as far as I recall the current behavior has been concluded as being fine. That's where --enable-tap-tests becomes meaningful. > 2) --enable-tap-tests deserves mention in test/recovery/README and more > obvious error message when one trying to run make check in test/recovery > without --enable-tap-tests. When running without --enable-tap-tests from src/test/recovery you would get the following error per how prove_check is defined: "TAP tests not enabled" > 3) Is it hard to give ability to run TAP tests in extensions? Not really. You would need to enforce a check rule or similar. For the recovery test suite I have mapped the check rule with prove_check. > 4) It will be handy if make check will write path to log files in case of > failed test. Hm, perhaps. The log files are hardcoded in log/, so it is not like we don't know it. That's an argument for the main TAP suite though, not really this series of patch. > 5) psql() accepts database name as a first argument, but everywhere in tests > it is ‘postgres’. Isn’t it simpler to store dbname in connstr, and have > separate function to change database? > 6) Clean logs on prove restart? Clean up tmp installations? Those are issues proper to the main TAP infrastructure, though I agree that we could improve things here, particularly for temporary installations that get automatically... Hm... Cleaned up should a test failure happen? > 7) Make check sets PGPORT PG_REGRESS for prove. Is it necessary? No, that's not needed (I think I noticed that at some point) and that's a bug. We could live without setting it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2016-01 Commitfest
On 2016-02-03 12:32:47 -0500, Tom Lane wrote: > Heikki and/or Andres have their names on three of the remaining RFC > patches; it's unlikely any other committer will touch those patches > unless they take their names off. If you want to take over the timestamp patch, please feel free to do so. Otherwise I'll get to it in ~10 days. I'm working on/off on the checkpointer flushing thing, so I don't think it makes sense for somebody else to take that over at this point. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sat, Jan 30, 2016 at 5:29 AM, Robert Haaswrote: >> I meant use "quicksort with spillover" simply because an estimated >> 90%+ of all tuples have already been consumed. Don't consider the >> tuple width, etc. > > Hmm, it's a thought. To be honest, it's a bit annoying that this is one issue we're stuck on, because "quicksort with spillover" is clearly of less importance overall. (This is a distinct issue from the issue of not using a replacement selection style heap for the first run much of the time, which seems to be a discussion about whether and to what extent the *traditional* advantages of replacement selection hold today, as opposed to a discussion about a very specific crossover point in my patch.) >> Really? Just try it with a heap that is not tiny. Performance tanks. >> The fact that replacement selection can produce one long run then >> becomes a liability, not a strength. With a work_mem of something like >> 1GB, it's *extremely* painful. > > I'm not sure exactly what you think I should try. I think a couple of > people have expressed the concern that your patch might regress things > on data that is all in order, but I'm not sure if you think I should > try that case or some case that is not-quite-in-order. "I don't see > that you've justified that statement" is referring to the fact that > you presented no evidence in your original post that it's important to > sometimes use quicksorting even for run #1. If you've provided some > test data illustrating that point somewhere, I'd appreciate a pointer > back to it. I think that the answer to what you should try is simple: Any case involving a large heap (say, a work_mem of 1GB). No other factor like correlation seems to change the conclusion about that being generally bad. If you have a correlation, then that is *worse* if "quicksort with spillover" always has us use a heap for the first run, because it prolongs the pain of using the cache inefficient heap (note that this is an observation about "quicksort with spillover" in particular, and not replacement selection in general). The problem you'll see is that there is a large heap which is __slow__ to spill from, and that's pretty obvious with or without a correlation. In general it seems unlikely that having one long run during the merge (i.e. no merge -- seen by having the heap build one long run because we got "lucky" and "quicksort with spillover" encountered a correlation) can ever hope to make up for this. It *could* still make up for it if: 1. There isn't much to make up for in the first place, because the heap is CPU cache resident. Testing this with a work_mem that is the same size as CPU L3 cache seems a bit pointless to me, and I think we've seen that a few times. and: 2. There are many passes required without a replacement selection heap, because the volume of data is just so much greater than the low work_mem setting. Replacement selection makes the critical difference because there is a correlation, perhaps strong enough to make it one or two runs rather than, say, 10 or 20 or 100. I've already mentioned many times that linear growth in the size of work_mem sharply reduces the need for additional passes during the merge phase (the observation about quadratic growth that I won't repeat). These days, it's hard to recommend anything other than "use more memory" to someone trying to use 4MB to sort 10GB of data. Yeah, it would also be faster to use replacement selection for the first run in the hope of getting lucky (actually lucky this time; no quotes), but it's hard to imagine that that's going to be a better option, no matter how frugal the user is. Helping users recognize when they could use more memory effectively seems like the best strategy. That was the idea behind multipass_warning, but you didn't like that (Greg Stark was won over on the multipass_warning warning, though). I hope we can offer something roughly like that at some point (a view?), because it makes sense. > How about always starting with replacement selection, but limiting the > amount of memory that can be used with replacement selection to some > small value? It could be a separate GUC, or a hard-coded constant > like 20MB if we're fairly confident that the same value will be good > for everyone. If the tuples aren't in order, then we'll pretty > quickly come to the end of the first run and switch to quicksort. This seems acceptable, although note that we don't have to decide until we reach the work_mem limit, and not before. If you want to use a heap for the first run, I'm not excited about the idea, but if you insist then I'm glad that you at least propose to limit it to the kind of cases that we *actually* saw regressed (i.e. low work_mem settings -- like the default work_mem setting, 4MB). We've seen no actual case with a larger work_mem that is advantaged by using a heap, even *with* a strong correlation (this is actually *worst of all*); that's where I am
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Thu, Feb 4, 2016 at 3:28 PM, Etsuro Fujitawrote: > On 2016/02/04 17:58, Etsuro Fujita wrote: > >> On 2016/02/04 8:00, Robert Haas wrote: >> >>> On Wed, Feb 3, 2016 at 5:56 PM, Robert Haas >>> wrote: >>> On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat wrote: > PFA patches with naming conventions similar to previous ones. > pg_fdw_core_v7.patch: core changes > pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown > pg_join_pd_v7.patch: combined patch for ease of testing. > > One more: I think the following in postgresGetForeignJoinPaths() is a good > idea, but I think it's okay to just check whether root->rowMarks is > non-NIL, because that since we have rowmarks for all base relations except > the target, if we have root->parse->commandType==CMD_DELETE (or > root->parse->commandType==CMD_UPDATE), then there would be at least one > non-target base relation in the joinrel, which would have a rowmark. > > Sorry, I am unable to understand it fully. But what you are suggesting that if there are root->rowMarks, then we are sure that there is at least one base relation apart from the target, which needs locking rows. Even if we don't have one, still changes in a row of target relation after it was scanned, can result in firing EPQ check, which would need the local plan to be executed, thus even if root->rowMarks is NIL, EPQ check can fire and we will need alternate local plan. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations
On Wed, Feb 3, 2016 at 10:48 AM, Robert Haaswrote: > I don't feel qualified to have an opinion on whether this is an > improvement. I'm a little skeptical of changes like this on general > principle because sometimes one clientele wants error A to be reported > rather than error B and some other clientele wants the reverse. > Deciding who is right is above my pay grade. Exclusion constraints can sometimes have one client deadlock, rather than see an exclusion violation. The particular client that sees an error is undefined either way, so I personally felt that that wasn't very important. But that's a problem that I'm closer to, and I won't express an opinion on this patch. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Tue, Feb 2, 2016 at 7:19 PM, Robert Haaswrote: > > On Mon, Feb 1, 2016 at 12:27 AM, Amit Kapila wrote: > > Fixed. > > This patch doesn't build: > > ./xfunc.sgml:int lwlock_count = 0; > Tabs appear in SGML/XML files > Changed the documentation and now I am not getting build failure. > The #define NUM_LWLOCKS 1 just seems totally unnecessary, as does int > lwlock_count = 0. You're only assigning one lock! I'd just do > RequestAddinLWLockTranche("pg_stat_statements locks", 1); pgss->lock = > GetLWLockAddinTranche("pg_stat_statements locks")->lock; and call it > good. > Changed as per suggestion. > I think we shouldn't foreclose the idea of core users of this facility > by using names like NumLWLocksByLoadableModules(). Why can't an > in-core client use this API? I think instead of calling these "addin > tranches" we should call them "named tranches"; thus public APIs > RequestNamedLWLockTranche() > and GetNamedLWLockTranche(), and private variables > NamedLWLockTrancheRequests, NamedLWLockTrancheRequestsAllocated, etc. > In fact, > Changed as per suggestion. > I do not see an obvious reason why the two looks in CreateLWLocks() > that end with "} while (++i < LWLockTrancheRequestsCount);" could not > be merged, and I believe that would be cleaner than what you've got > now. Similarly, the two loops in GetLWLockAddinTranche() could also > be merged. Just keep a running total and return it when you find a > match. > > I think it would be a good idea to merge LWLockAddInTrancheShmemSize > into LWLockShmemSize. I don't see why that function can't compute a > grand total and return it. > Agreed with both the points and changed as per suggestion. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com separate_tranche_extensions_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 22/10/15 03:56, Michael Paquier wrote: On Wed, Oct 21, 2015 at 11:53 PM, Alvaro Herrerawrote: Heikki Linnakangas wrote: Thanks. For comparison, I wrote a patch to implement what I had in mind. When a WAL-skipping COPY begins, we add an entry for that relation in a "pending-fsyncs" hash table. Whenever we perform any action on a heap that would normally be WAL-logged, we check if the relation is in the hash table, and skip WAL-logging if so. I think this wasn't applied, was it? No, it was not applied. I dropped the ball on this one back in July, so here's an attempt to revive this thread. I spent some time fixing the remaining issues with the prototype patch I posted earlier, and rebased that on top of current git master. See attached. Some review of that would be nice. If there are no major issues with it, I'm going to create backpatchable versions of this for 9.4 and below. - Heikki >From 063e1aa258800873783190a9678d551b43c0e39e Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 4 Feb 2016 15:21:09 +0300 Subject: [PATCH 1/1] Fix the optimization to skip WAL-logging on table created in same xact. There were several bugs in the optimization to skip WAL-logging for a table that was created (or truncated) in the same transaction, with wal_level=minimal, leading to data loss if a crash happened after the optimization was used: * If the table was created, and then truncated, and then loaded with COPY, we would replay the truncate record at commit, and the table would end up being empty after replay. * If there is a trigger on a table that modifies the same table, and you COPY to the table in the transaction that created it, you might have some WAL-logged operations on a page, performed by the trigger, intermixed with the non-WAL-logged inserts done by the COPY. That can lead to crash at recovery, because we might try to replay a WAL record that e.g. updates a tuple, but insertion of the tuple was not WAL-logged. --- src/backend/access/heap/heapam.c| 254 +++- src/backend/access/heap/pruneheap.c | 2 +- src/backend/access/heap/rewriteheap.c | 4 +- src/backend/access/heap/visibilitymap.c | 2 +- src/backend/access/transam/xact.c | 7 + src/backend/catalog/storage.c | 250 --- src/backend/commands/copy.c | 14 +- src/backend/commands/createas.c | 9 +- src/backend/commands/matview.c | 6 +- src/backend/commands/tablecmds.c| 5 +- src/backend/commands/vacuumlazy.c | 6 +- src/backend/storage/buffer/bufmgr.c | 47 -- src/include/access/heapam.h | 8 +- src/include/access/heapam_xlog.h| 2 + src/include/catalog/storage.h | 3 + src/include/storage/bufmgr.h| 2 + 16 files changed, 487 insertions(+), 134 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index f443742..79298e2 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -34,6 +34,28 @@ * the POSTGRES heap access method used for all POSTGRES * relations. * + * WAL CONSIDERATIONS + * All heap operations are normally WAL-logged. but there are a few + * exceptions. Temporary and unlogged relations never need to be + * WAL-logged, but we can also skip WAL-logging for a table that was + * created in the same transaction, if we don't need WAL for PITR or + * WAL archival purposes (i.e. if wal_level=minimal), and we fsync() + * the file to disk at COMMIT instead. + * + * The same-relation optimization is not employed automatically on all + * updates to a table that was created in the same transacton, because + * for a small number of changes, it's cheaper to just create the WAL + * records than fsyncing() the whole relation at COMMIT. It is only + * worthwhile for (presumably) large operations like COPY, CLUSTER, + * or VACUUM FULL. Use heap_register_sync() to initiate such an + * operation; it will cause any subsequent updates to the table to skip + * WAL-logging, if possible, and cause the heap to be synced to disk at + * COMMIT. + * + * To make that work, all modifications to heap must use + * HeapNeedsWAL() to check if WAL-logging is needed in this transaction + * for the given block. + * *- */ #include "postgres.h" @@ -55,6 +77,7 @@ #include "access/xlogutils.h" #include "catalog/catalog.h" #include "catalog/namespace.h" +#include "catalog/storage.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" @@ -2332,12 +2355,6 @@ FreeBulkInsertState(BulkInsertState bistate) * The new tuple is stamped with current transaction ID and the specified * command ID. * - * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
> >If so, my concern is, the helper function probably wouldn't >> extend to the parameterized-foreign-join-path cases, though that >> would work well for the unparameterized-foreign-join-path cases. We >> don't support parameterized-foreign-join paths for 9.6? >> > > If we do not find a local path with given parameterization, it means >> there are other local parameterized paths which are superior to it. This >> possibly indicates that there will be foreign join parameterised paths >> which are superior to this parameterized path, so we basically do not >> create foreign join path with that parameterization. >> > > The latest version of the postgres_fdw join pushdown patch will support > only the unparameterized-path case, so we don't have to consider this, but > why do you think the superiority of parameterizations is preserved between > remote joining and local joining? > AFAIU, parameterization for local paths bubbles up from base relations. For foreign relations, we calculate the cost of parameterization when use_remote_estimate is ON, which means it's accurate. So, except that we will get clause selectivity wrong (if foreign tables were analyzed regularly even that won't be the case, I guess) resulting in some small sway in the costs as compared to those of parameterized foreign join paths. So, I am guessing that the local estimates for parameterized join paths would be closer to parameterized foreign paths (if we were to produce those). Hence my statement. There is always a possibility that those two costs are way too different, hence I have used phrase "possibly" there. I could be wrong. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
On Wed, Feb 3, 2016 at 3:08 PM, Andres Freundwrote: > > On 2016-02-02 10:12:25 +0530, Amit Kapila wrote: > > @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags) > > if ((flags & (CHECKPOINT_IS_SHUTDOWN | > > CHECKPOINT_END_OF_RECOVERY | > >CHECKPOINT_FORCE)) == 0) > > { > > - if > > (prevPtr == ControlFile->checkPointCopy.redo && > > + if (GetProgressRecPtr() == prevPtr && > > > > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE) > > { > > > > I think such a check won't consider all the WAL-activity happened > > during checkpoint (after checkpoint start, till it finishes) which was > > not the intention of this code without your patch. > > Precisely. > > > I think both this and previous patch (hs-checkpoints-v1 ) approach > > won't fix the issue in all kind of scenario's. > > Agreed. > > > Let me try to explain what I think should fix this issue based on > > discussion above, suggestions by Andres and some of my own > > thoughts: > > > Have a new variable in WALInsertLock that would indicate the last > > insertion position (last_insert_pos) we write to after holding that lock. > > Ensure that we don't update last_insert_pos while logging standby > > snapshot (simple way is to pass a flag in XLogInsert or distinguish > > it based on type of record (XLOG_RUNNING_XACTS) or if you can > > think of a more smarter way). Now both during checkpoint and > > in bgwriter, to find out whether there is any activity since last > > time, we need to check all the WALInsertLocks for latest insert position > > (by referring last_insert_pos) and compare it with the latest position > > we have noted during last such check and decide whether to proceed > > or not based on comparison result. If you think it is not adequate to > > store last_insert_pos in WALInsertLock, then we can think of having > > it in PGPROC. > > Yes, that's pretty much what I was thinking of. > I think generally it is good idea, but one thing worth a thought is that by doing so, we need to acquire all WAL Insertion locks every LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for every slot, do you think it is matter of concern in any way for write workloads or it won't effect as we need to do this periodically? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/04 17:58, Etsuro Fujita wrote: On 2016/02/04 8:00, Robert Haas wrote: On Wed, Feb 3, 2016 at 5:56 PM, Robert Haaswrote: On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat wrote: PFA patches with naming conventions similar to previous ones. pg_fdw_core_v7.patch: core changes pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown pg_join_pd_v7.patch: combined patch for ease of testing. One more: I think the following in postgresGetForeignJoinPaths() is a good idea, but I think it's okay to just check whether root->rowMarks is non-NIL, because that since we have rowmarks for all base relations except the target, if we have root->parse->commandType==CMD_DELETE (or root->parse->commandType==CMD_UPDATE), then there would be at least one non-target base relation in the joinrel, which would have a rowmark. + if (root->parse->commandType == CMD_DELETE || + root->parse->commandType == CMD_UPDATE || + root->rowMarks) + { + epq_path = GetPathForEPQRecheck(joinrel); + if (!epq_path) + { + elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found"); + return; + } + } Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] silent data loss with ext4 / all current versions
On Thu, Feb 4, 2016 at 12:02 PM, Michael Paquierwrote: > On Tue, Feb 2, 2016 at 4:20 PM, Michael Paquier wrote: >> Not wrong, and this leads to the following: >> void rename_safe(const char *old, const char *new, bool isdir, int elevel); >> Controlling elevel is necessary per the multiple code paths that would >> use it. Some use ERROR, most of them FATAL, and a bit of WARNING. Does >> that look fine? > > After really coding it, I finished with the following thing: > +int > +rename_safe(const char *old, const char *new) > > There is no need to extend that for directories, well we could of > course but all the renames happen on files so I see no need to make > that more complicated. More refactoring of the other rename() calls > could be done as well by extending rename_safe() with a flag to fsync > the data within a critical section, particularly for the replication > slot code. I have let that out to not complicate more the patch. Andres just poked me (2m far from each other now) regarding the fact that we should fsync even after the link() calls when HAVE_WORKING_LINK is used. So we could lose some meta data here. Mea culpa. And the patch misses that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Raising the checkpoint_timeout limit
On 2016-02-03 15:07:12 -0600, Jim Nasby wrote: > On 2/2/16 10:10 PM, Robert Haas wrote: > >Now, you could also set such configuration settings in > >a situation where it will not work out well. But that is true of most > >configuration settings. By that argument we should probably raise the lower limit of a bunch of parameters :P > Yeah, if we're going to start playing parent then I think the first thing to > do is remove the fsync GUC. The AWS team has done testing that shows it to > be worthless from a performance standpoint now that we have synchronous > commit, and it's an extremely large foot-bazooka to have laying around. Meh, I don't buy that. There are workloads where that's the case, but also ones were it's not true. Try e.g. 2PC. And yes, there's definitely cases where 2PC makes sense, even if you don't need durability on a local basis. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Raising the checkpoint_timeout limit
On Mon, Feb 1, 2016 at 8:16 PM, Noah Mischwrote: >> I'm not sure what'd actually be a good upper limit. I'd be inclined to >> even go to as high as a week or so. A lot of our settings have >> upper/lower limits that aren't a good idea in general. > > In general, I favor having limits reflect fundamental system limitations > rather than paternalism. Therefore, I would allow INT_MAX (68 years). I agree. I'm in favor of having things be what is sometimes called foolproof, but I think that you can only take that so far, and it's mostly a matter of guiding a more or less reasonable user in the right direction. Making it easy to do the right thing and hard to do the wrong thing. I don't think you can effectively design anything around a user that makes perversely bad decision at every turn. If you investigate why a user made a bad decision, there will usually be a chain of faulty but not outrageous reasoning behind it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect formula for SysV IPC parameters
On Wed, Feb 3, 2016 at 12:51 PM, Kyotaro HORIGUCHIwrote: > Hello, I found that the formulas to calculate SEMMNI and SEMMNS > are incorrect in 9.2 and later. > > http://www.postgresql.org/docs/9.5/static/kernel-resources.html > > All of them say that the same thing as following, > > | SEMMNI Maximum number of semaphore identifiers (i.e., sets) > | > | at least ceil((max_connections + autovacuum_max_workers + 4) / 16) > | > | SEMMNSMaximum number of semaphores system-wide > | > | ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 > |plus room for other applications > > But actually the number of semaphores PostgreSQL needs is > calculated as following in 9.4 and later. > > numSemas = MaxConnections + NUM_AUXILIARY_PROCS(=4) > MaxConnections = max_connections + autovacuum_max_workers + 1 + >max_worker_processes > > So, the formula for SEMMNI should be > > ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / > 16) > > and SEMMNS should have the same fix. > > > In 9.3 and 9.2, the documentation says the same thing but > actually it is calculated as following, > > numSemas = MaxConnections + NUM_AUXILIARY_PROCS(=4) > MaxConnections = max_connections + autovacuum_max_workers + 1 + >GetNumShmemAttachedBgworkers() > > Omitting GetNumShmemAttachedBgworkers, the actual formula is > > ceil((max_connections + autovacuum_max_workers + 5) / 16) > > > In 9.1, NUM_AUXILIARY_PROCS is 3 so the documentations is correct. > > > I attached two patches for 9.2-9.3 and 9.4-9.6dev > respectively. patch command complains a bit on applying it on > 9.2. Good catch! ISTM that you also need to change the descriptions about SEMMNI and SEMMNS under the Table 17-1. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
> * Is it safe to replace outerjoinpath with its fdw_outerpath the following > way? I think that if the join relation represented by outerjoinpath has > local conditions that can't be executed remotely, we have to keep > outerjoinpath in the path tree; we will otherwise fail to execute the local > conditions. No? > > + /* > +* If either inner or outer path is a ForeignPath > corresponding to > +* a pushed down join, replace it with the > fdw_outerpath, so that we > +* maintain path for EPQ checks built entirely of > local join > +* strategies. > +*/ > + if (IsA(joinpath->outerjoinpath, ForeignPath)) > + { > + ForeignPath *foreign_path; > + foreign_path = (ForeignPath > *)joinpath->outerjoinpath; > + if (foreign_path->path.parent->reloptkind > == RELOPT_JOINREL) > + joinpath->outerjoinpath = > foreign_path->fdw_outerpath; > + } > > all the conditions (local and remote) should be part of fdw_outerpath as well, since that's the alternate local path, which should produce (when converted to the plan) the same result as the foreign path. fdw_outerpath should be a local path set when paths for outerjoinpath->parent was being created. Am I missing something? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujitawrote: > On 2016/01/28 15:20, Rushabh Lathia wrote: > >> On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita >> > wrote: >> >> On 2016/01/27 21:23, Rushabh Lathia wrote: >> >> If I understood correctly, above documentation means, that if >> FDW have >> DMLPushdown APIs that is enough. But in reality thats not the >> case, we >> need ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >> in case >> DML is not pushable. >> >> And here fact is DMLPushdown APIs are optional for FDW, so that >> if FDW >> don't have DMLPushdown APIs they can still very well perform the >> DML >> operations using ExecForeignInsert, ExecForeignUpdate, or >> ExecForeignDelete. >> > > So documentation should be like: >> >> If the IsForeignRelUpdatable pointer is set to NULL, foreign >> tables are >> assumed to be insertable, updatable, or deletable if the FDW >> provides >> ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >> respectively, >> >> If FDW provides DMLPushdown APIs and the DML are pushable to the >> foreign >> server, then FDW still needs ExecForeignInsert, >> ExecForeignUpdate, or >> ExecForeignDelete for the non-pushable DML operation. >> >> What's your opinion ? >> > > I agree that we should add this to the documentation, too. >> > > I added docs to the IsForeignRelUpdatable documentation. Also, a brief > introductory remark has been added at the beginning of the DML pushdown > APIs' documentation. > > BTW, if I understand correctly, I think we should also modify >> relation_is_updatabale() accordingly. Am I right? >> > > Yep, we need to modify relation_is_updatable(). >> > > I thought I'd modify that function in the same way as > CheckValidResultRel(), but I noticed that we cannot do that, because we > don't have any information on whether each update is pushed down to the > remote server by PlanDMLPushdown, during relation_is_updatabale(). So, I > left that function as-is. relation_is_updatabale() is just used for > display in the information_schema views, so ISTM that that function is fine > as-is. (As for CheckValidResultRel(), I revised it so as to check the > presence of DML pushdown APIs after checking the existing APIs if the given > command will be pushed down. The reason is because we assume the presence > of the existing APIs, anyway.) > > I revised other docs and some comments, mostly for consistency. > > I just started reviewing this and realized that patch is not getting applied cleanly on latest source, it having some conflicts. Can you please upload the correct version of patch. Attached is an updated version of the patch, which has been created on top > of the updated version of the bugfix patch posted by Robert in [1] > (attached). > > > Best regards, > Etsuro Fujita > > [1] > http://www.postgresql.org/message-id/ca+tgmoz40j2uc5ac1nxu03oj4crvolks15xx+ptfp-1u-8z...@mail.gmail.com > -- Rushabh Lathia
Re: [HACKERS] Using quicksort for every external sort run
On Thu, Feb 4, 2016 at 1:46 AM, Peter Geogheganwrote: > It wasn't my original insight that replacement selection has become > all but obsolete. It took me a while to come around to that point of > view. Nyberg et al may have said it best in 1994, in the Alphasort Paper [1]: "By comparison, OpenVMS sort uses a pure replacement-selection sort to generate runs (Knuth, 1973). Replacement-selection is best for a memory-constrained environment. On average, replacement-selection generates runs that are twice as large as available memory, while the QuickSort runs are typically less than half of available memory. However, in a memory-rich environment, QuickSort is faster because it is simpler, makes fewer exchanges on average, and has superior address locality to exploit processor caching. " (I believe that the authors state that "QuickSort runs are typically less than half of available memory" because of the use of explicit asynchronous I/O in each thread, which doesn't apply to us). The paper also has very good analysis of the economics of sorting: "Even for surprisingly large sorts, it is economical to perform the sort in one pass." Of course, memory capacities have scaled enormously in the 20 years since this analysis was performed, so the analysis applies even at the very low end these days. The high capacity memory system that they advocate to get a one pass sort (instead of having faster disks) had 100MB of memory, which is of course tiny by contemporary standards. If you pay Heroku $7 a month, you get a "Hobby Tier" database with 512MB of memory. The smallest EC2 instance size, the t2.nano, costs about $1.10 to run for one week, and has 0.5GB of memory. The economics of using 4MB or even 20MB to sort 10GB of data are already preposterously bad for everyone that runs a database server, no matter how budget conscious they may be. I can reluctantly accept that we need to still use a heap with very low work_mem settings to avoid the risk of a regression (in the event of a strong correlation) on general principle, but I'm well justified in proposing "just don't do that" as the best practical advice. I thought I had your agreement on that point, Robert; is that actually the case? [1] http://www.cs.berkeley.edu/~rxin/db-papers/alphasort.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoints after database start/immediate checkpoints
On 2016-02-03 09:57:00 -0500, Robert Haas wrote: > On Mon, Feb 1, 2016 at 7:43 PM, Andres Freundwrote: > > I wonder if this essentially point at checkpoint_timeout being wrongly > > defined: Currently it means we'll try to finish a checkpoint > > (1-checkpoint_completion_target) * timeout before the next one - but > > perhaps it should instead be that we start checkpoint_timeout * _target > > before the next timeout? Afaics that'd work more graceful in the face of > > restarts and forced checkpoints. > > There's a certain appeal to that, but at the same time it seems pretty > wonky. Right now, you can say that a checkpoint is triggered when the > amount of WAL reaches X or the amount of time reaches Y, but with the > alternative definition it's a bit harder to explain what's going on > there. Hm, but can you, really? We *start* a checkpoint every checkpoint_timeout, but we only finish it after checkpoint_completion_target * timeout, or cct * segments. I find it pretty hard to explain that we have a gap of checkpoint_timeout, where nothing happens, after an immediate/manual checkpoint. Defining it as: We try to *finish* a checkpoint every checkpoint_timeout or checkpoint_segments/(max_wal_size/~3) actually seems simpler to me. Then we just need to add that we start a checkpoint checkpoint_completion_target before either of the above are reached. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Thu, Feb 4, 2016 at 11:55 AM, Ashutosh Bapatwrote: > Patches attached with the previous mail. The core patch seemed to me to be in good shape now, so I committed that. Not sure I'll be able to get to another read-through of the main patch today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Thu, Feb 4, 2016 at 7:00 AM, Amit Kapilawrote: > [ new patch ] I've committed this after heavy rewriting. In particular, I merged two loops, used local variables to get rid of the very long and quite repetitive lines, and did various cleanup on the documentation and comments. I think we ought to move the buffer mapping, lock manager, and predicate lock manager locks into their own tranches also, perhaps using this new named-tranche facility. In addition, I think we should get rid of NumLWLocks(). It's no longer calculating anything; it just returns NUM_FIXED_LWLOCKS, and with the changes described in the first sentence of this paragraph, it will simply return NUM_INDIVIDUAL_LWLOCKS. We don't need a function just to do that; the assignment it does can be moved to the relevant caller. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNIQUE capability to hash indexes
Alvaro Herrerawrites: > Tom Lane wrote: >> Not that I've heard of. It's very hard to muster any enthusiasm for >> improving hash indexes unless their lack of WAL-logging is fixed first. > This is really strange though. Surely adding WAL-logging is not an > enormous task anymore ... I mean, we're undertaking far larger efforts > now, the WAL logging code is simpler than before, and we even have a > tool (ok, gotta streamline that one a little bit) to verify that the > results are correct. ISTR that we discussed this previously and ran into some stumbling block or other that made it less-than-trivial. Don't recall what though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
David G. Johnston wrote: > Learning by reading here... > > http://www.postgresql.org/docs/current/static/wal-internals.html > """ > After a checkpoint has been made and the log flushed, the checkpoint's > position is saved in the file pg_control. Therefore, at the start of > recovery, the server first reads pg_control and then the checkpoint record; > then it performs the REDO operation by scanning forward from the log > position indicated in the checkpoint record. Because the entire content of > data pages is saved in the log on the first page modification after a > checkpoint (assuming full_page_writes is not disabled), all pages changed > since the checkpoint will be restored to a consistent state. > > To deal with the case where pg_control is corrupt, we should support the > possibility of scanning existing log segments in reverse order — newest to > oldest — in order to find the latest checkpoint. This has not been > implemented yet. pg_control is small enough (less than one disk page) that > it is not subject to partial-write problems, and as of this writing there > have been no reports of database failures due solely to the inability to > read pg_control itself. So while it is theoretically a weak spot, > pg_control does not seem to be a problem in practice. > """ > > The above comment appears out-of-date if this post describes what > presently happens. I think you're misinterpreting Andres, or the docs, or both. What Andres says is that the control file (pg_control) stores two checkpoint locations: the latest one, and the one before that. When recovery occurs, it starts by looking up the latest checkpoint record; if it cannot find that for whatever reason, it falls back to reading the previous one. (He further claims that falling back to the previous one is a bad idea.) What the 2nd para in the documentation is saying is something different: it is talking about reading all the pg_xlog files (in reverse order), which is not pg_control, and see what checkpoint records are there, then figure out which one to use. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] insufficient qualification of some objects in dump files
Peter Eisentraut wrote: > Some dump objects whose names are not unique on a schema level have > insufficient details in the dump TOC. For example, a column default > might have a TOC entry like this: > > 2153; 2604 39696 DEFAULT public a rolename > I think we should amend the archive tag for these kinds of objects to > include the table name, so it might look like: > > 2153; 2604 39696 DEFAULT public test a rolename > > Comments? If we're changing anything, I wonder if it makes sense to go for adding the object identity there. Perhaps that would make the whole thing be a little more regular, rather than have to count elements depending on object type. So in this case it'd be 2153; 2604 39696 DEFAULT public.test.a rolename -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
On Thu, Feb 4, 2016 at 3:57 PM, Alvaro Herrerawrote: > David G. Johnston wrote: > > > Learning by reading here... > > > > http://www.postgresql.org/docs/current/static/wal-internals.html > > """ > > After a checkpoint has been made and the log flushed, the checkpoint's > > position is saved in the file pg_control. Therefore, at the start of > > recovery, the server first reads pg_control and then the checkpoint > record; > > then it performs the REDO operation by scanning forward from the log > > position indicated in the checkpoint record. Because the entire content > of > > data pages is saved in the log on the first page modification after a > > checkpoint (assuming full_page_writes is not disabled), all pages changed > > since the checkpoint will be restored to a consistent state. > > > > To deal with the case where pg_control is corrupt, we should support the > > possibility of scanning existing log segments in reverse order — newest > to > > oldest — in order to find the latest checkpoint. This has not been > > implemented yet. pg_control is small enough (less than one disk page) > that > > it is not subject to partial-write problems, and as of this writing there > > have been no reports of database failures due solely to the inability to > > read pg_control itself. So while it is theoretically a weak spot, > > pg_control does not seem to be a problem in practice. > > """ > > > > The above comment appears out-of-date if this post describes what > > presently happens. > > I think you're misinterpreting Andres, or the docs, or both. > > What Andres says is that the control file (pg_control) stores two > checkpoint locations: the latest one, and the one before that. When > recovery occurs, it starts by looking up the latest checkpoint record; > if it cannot find that for whatever reason, it falls back to reading the > previous one. (He further claims that falling back to the previous one > is a bad idea.) > > What the 2nd para in the documentation is saying is something different: > it is talking about reading all the pg_xlog files (in reverse order), > which is not pg_control, and see what checkpoint records are there, then > figure out which one to use. > Yes, I inferred something that obviously isn't true - that the system doesn't go hunting for a valid checkpoint to begin recovery from. While it does not do so in the case of a corrupted pg_control file I further assumed it never did. That would be because the documentation doesn't make the point of stating that two checkpoint positions exist and that PostgreSQL will try the second one if the first one proves unusable. Given the topic of this thread that omission makes the documentation out-of-date. Maybe its covered elsewhere but since this section addresses locating a starting point I would expect any such description to be here as well. David J.
Re: [HACKERS] checkpoints after database start/immediate checkpoints
On Thu, Feb 4, 2016 at 4:42 PM, Andres Freundwrote: > On 2016-02-03 09:57:00 -0500, Robert Haas wrote: >> On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund wrote: >> > I wonder if this essentially point at checkpoint_timeout being wrongly >> > defined: Currently it means we'll try to finish a checkpoint >> > (1-checkpoint_completion_target) * timeout before the next one - but >> > perhaps it should instead be that we start checkpoint_timeout * _target >> > before the next timeout? Afaics that'd work more graceful in the face of >> > restarts and forced checkpoints. >> >> There's a certain appeal to that, but at the same time it seems pretty >> wonky. Right now, you can say that a checkpoint is triggered when the >> amount of WAL reaches X or the amount of time reaches Y, but with the >> alternative definition it's a bit harder to explain what's going on >> there. > > Hm, but can you, really? We *start* a checkpoint every > checkpoint_timeout, but we only finish it after > checkpoint_completion_target * timeout, or cct * segments. I find it > pretty hard to explain that we have a gap of checkpoint_timeout, where > nothing happens, after an immediate/manual checkpoint. > > Defining it as: We try to *finish* a checkpoint every checkpoint_timeout > or checkpoint_segments/(max_wal_size/~3) actually seems simpler to > me. Then we just need to add that we start a checkpoint > checkpoint_completion_target before either of the above are reached. Hmm, I could go for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNIQUE capability to hash indexes
On 02/04/2016 11:34 PM, Tom Lane wrote: Alvaro Herrerawrites: This is really strange though. Surely adding WAL-logging is not an enormous task anymore ... I mean, we're undertaking far larger efforts now, the WAL logging code is simpler than before, and we even have a tool (ok, gotta streamline that one a little bit) to verify that the results are correct. ISTR that we discussed this previously and ran into some stumbling block or other that made it less-than-trivial. Don't recall what though. The last discussion I can recall was Robert's ideas on how to solve the splitting of buckets. http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNIQUE capability to hash indexes
Tom Lane wrote: > Shubham Baraiwrites: > > I wanted to know if anyone is working on these projects from todo list. > > 1.Add UNIQUE capability to hash indexes > > 2.Allow multi-column hash indexes > > Not that I've heard of. It's very hard to muster any enthusiasm for > improving hash indexes unless their lack of WAL-logging is fixed first. This is really strange though. Surely adding WAL-logging is not an enormous task anymore ... I mean, we're undertaking far larger efforts now, the WAL logging code is simpler than before, and we even have a tool (ok, gotta streamline that one a little bit) to verify that the results are correct. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
On 2016-02-03 09:28:24 -0500, Robert Haas wrote: > Would we still have some way of forcing the older checkpoint record to > be used if somebody wants to try to do that? I think currently the best way to force an arbitrary checkpoint to be used is creating a "custom" backup label. Not that nice. Not sure if we need something nice here, I don't really see a frequent need for this. We could add another option to pg_resetxlog alternatively :/ Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] count_nulls(VARIADIC "any")
Pavel Stehulewrites: > [ num_nulls_v6.patch ] I started looking through this. It seems generally okay, but I'm not very pleased with the function name "num_notnulls". I think it would be better as "num_nonnulls", as I see Oleksandr suggested already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Development with Eclipse - Wrong error messages in IDE
Peter Moser wrote: > I have some strange error message inside Eclipse, that some symbols cannot > be found. I work with version 9.6 currently. For instance, > > Symbol 'RM_HEAP_ID' could not be resolved > src/backend/access/heap/heapam.c > > It affects all occurrences of symbols that are defined in > src/include/access/rmgrlist.h. Eclipse just says "Syntax error" here. > > However, the source code compiles and runs without any compile-time error or > warning. It is just an IDE problem I think, but it distracts me from finding > real bugs. Disclaimer: I've never used eclipse. The problem is some perhaps-too-clever stuff we do to avoid repetitive declarations of things. The rmgr stuff uses a PG_RMGR macro, which is defined differently in src/backend/access/transam/rmgr.c and src/include/access/rmgr.h; the latter contains the actual enum definition. On the other hand Eclipse is trying to be too clever by processing the C files, but not actually getting it completely right (which is understandable, really). We have other similar cases, such as grammar keywords (kwlist.h) I'm afraid that you'd have to teach Eclipse to deal with such things (which might be tricky) or live with it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
Hello, At Thu, 4 Feb 2016 23:06:45 +0300, Michael Paquierwrote in
Re: [HACKERS] Batch update of indexes
On 2/4/16 1:37 AM, konstantin knizhnik wrote: >My suspicion is that it would be useful to pre-order the new data before trying to apply it to the indexes. Sorry, but ALTER INDEX is expected to work for all indexes, not only B-Tree, and for them sorting may not be possible... But for B-Tree presorting inserted data should certainly increase performance. I will think about it. I wasn't talking about ALTER INDEX. My theory is that if you're doing a large DML operation it might be more efficient to update an index as a single bulk operation, instead of doing it for each tuple. If you want to do that, then you need an efficient method for finding everything that a DML statement changed. That's the exact same thing we need to support statement-level triggers being able to reference NEW and OLD. It's probably also what we need to support incremental update matviews. If we had such a capability then we could add options to the AM infrastructure to allow indexes to support doing bulk maintenance as well as per-tuple maintenance (or even support only bulk maintenance...) I don't think any of that has anything to do with ALTER INDEX. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
On February 5, 2016 2:52:20 AM GMT+03:00, Jim Nasbywrote: >On 2/4/16 3:37 PM, Andres Freund wrote: >> On 2016-02-03 09:28:24 -0500, Robert Haas wrote: >>> Would we still have some way of forcing the older checkpoint record >to >>> be used if somebody wants to try to do that? >> >> I think currently the best way to force an arbitrary checkpoint to be >> used is creating a "custom" backup label. Not that nice. Not sure if >we >> need something nice here, I don't really see a frequent need for >this. >> >> We could add another option to pg_resetxlog alternatively :/ > >I guess you'd have to scan through WAL files by hand to find the next >oldest checkpoint? Just look at pg control, it contains the precious location? --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Feb 4, 2016 at 11:58 PM, Stas Kelvichwrote: >> On 04 Feb 2016, at 12:59, Michael Paquier wrote: >>> 0) There are several routines that does actual checking, like >>> is/command_ok/command_fails. I think it will be very handy to have wrappers >>> psql_ok/psql_fails that calls psql through the command_ok/fails. >> >> Do you have a test case in mind for it? > > Yes, I’ve used that to test prepare/commit while recovery (script attached, > it’s in WIP state, i’ll submit that later along with other twophase stuff). Oh, OK, now I see. Well it seems to make sense for your case, though it does not seem to be directly linked to the patch here. We could incrementally add something on top of the existing infrastructure that gets into the code tree once the 2PC patch gets in a more advanced shape. >>> 2) --enable-tap-tests deserves mention in test/recovery/README and more >>> obvious error message when one trying to run make check in test/recovery >>> without --enable-tap-tests. >> >> When running without --enable-tap-tests from src/test/recovery you >> would get the following error per how prove_check is defined: >> "TAP tests not enabled" > > Yes, but that message doesn’t mention --enable-tap-tests and README also > silent about that too. I didn’t know about that flag and had to search in > makefiles for this error message to see what conditions leads to it. I think > we can save planet from one more stackoverflow question if the error message > will mention that flag. Well, that works for the whole TAP test infrastructure and not really this patch only. Let's not forget that the goal of this thread is to provide a basic set of tests and routines to help people building test cases for more advanced clustering scenarios, so I'd rather not complicate the code with side things and remain focused on the core problem. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
On 2/4/16 5:09 PM, David G. Johnston wrote: What the 2nd para in the documentation is saying is something different: it is talking about reading all the pg_xlog files (in reverse order), which is not pg_control, and see what checkpoint records are there, then figure out which one to use. Yes, I inferred something that obviously isn't true - that the system doesn't go hunting for a valid checkpoint to begin recovery from. While it does not do so in the case of a corrupted pg_control file I further assumed it never did. That would be because the documentation doesn't make the point of stating that two checkpoint positions exist and that PostgreSQL will try the second one if the first one proves unusable. Given the topic of this thread that omission makes the documentation out-of-date. Maybe its covered elsewhere but since this section addresses locating a starting point I would expect any such description to be here as well. Yeah, I think we should fix the docs. Especially since I imagine that if you're reading that part of the docs you're probably having a really bad day, and bad info won't help you... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?
On 2/4/16 3:37 PM, Andres Freund wrote: On 2016-02-03 09:28:24 -0500, Robert Haas wrote: Would we still have some way of forcing the older checkpoint record to be used if somebody wants to try to do that? I think currently the best way to force an arbitrary checkpoint to be used is creating a "custom" backup label. Not that nice. Not sure if we need something nice here, I don't really see a frequent need for this. We could add another option to pg_resetxlog alternatively :/ I guess you'd have to scan through WAL files by hand to find the next oldest checkpoint? I'm guessing that if this is happening in the field there's a decent chance people aren't noticing it, so maybe the best thing for now is to turn off the automatic behavior bust still have a relatively easy way to re-enable it. In case this is more common than we think... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: PL/Pythonu - function ereport
On 2/4/16 3:13 AM, Catalin Iacob wrote: Thanks for the overview. Very helpful. I find existing behaviour for 2, 3 and 4 unlike other Python APIs I've seen, surprising and not very useful. If I want to log a tuple I can construct and pass a single tuple, I don't see why plpy.info() needs to construct it for me. And for the documented, single argument call nothing changes. Agreed, that usage is wonky. The question to Bruce (and others) is: is it ok to change to the new behaviour illustrated and change meaning for usages like 2, 3 and 4? If any users have a bunch of code that depends on the old behavior, they're going to be rather irritated if we break it. If we want to depricate it then I think we need a GUC that allows you to get the old behavior back. If we don't want that, the solution Pavel and I see is introducing a parallel API named plpy.raise_info or plpy.rich_info or something like that with the new behaviour and leave the existing functions unchanged. Another option is some compatibility GUC but I don't think it's worth the trouble and confusion. If we're going to provide an alternative API, I'd just do plpy.raise(LEVEL, ...). At this point, my vote would be: Add a plpython.ereport_mode GUC that has 3 settings: current (deprecated) behavior, allow ONLY 1 argument, new behavior. The reason for the 1 argument option is it makes it much easier to find code that's still using the old behavior. I think it's also worth having plpy.raise(LEVEL, ...) as an alternative. If folks feel that's overkill then I'd vote to leave the existing behavior alone and just add plpy.raise(LEVEL, ...). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On 2/4/16 12:30 AM, Kouhei Kaigai wrote: >> 2. A feature to suspend i/o write-out towards a particular blocks >> >that are registered by other concurrent backend, unless it is not >> >unregistered (usually, at the end of P2P DMA). >> >==> to be discussed. I think there's still a race condition here though... A finds buffer not in shared buffers B reads buffer in modifies buffer starts writing buffer to OS A Makes call to block write, but write is already in process; thinks writes are now blocked Reads corrupted block Much hilarity ensues Or maybe you were just glossing over that part for brevity. ... > I tried to design a draft of enhancement to realize the above i/o write-out > suspend/resume, with less invasive way as possible as we can. > >ASSUMPTION: I intend to implement this feature as a part of extension, >because this i/o suspend/resume checks are pure overhead increment >for the core features, unless extension which utilizes it. > > Three functions shall be added: > > extern intGetStorageMgrNumbers(void); > extern f_smgr GetStorageMgrHandlers(int smgr_which); > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > As literal, GetStorageMgrNumbers() returns the number of storage manager > currently installed. It always return 1 right now. > GetStorageMgrHandlers() returns the currently configured f_smgr table to > the supplied smgr_which. It allows extensions to know current configuration > of the storage manager, even if other extension already modified it. > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > the current one. > If extension wants to intermediate 'smgr_write', extension will replace > the 'smgr_write' by own function, then call the original function, likely > mdwrite, from the alternative function. > > In this case, call chain shall be: > >FlushBuffer, and others... > +-- smgrwrite(...) > +-- (extension's own function) > +-- mdwrite ISTR someone (Robert Haas?) complaining that this method of hooks is cumbersome to use and can be fragile if multiple hooks are being installed. So maybe we don't want to extend it's usage... I'm also not sure whether this is better done with an smgr hook or a hook into shared buffer handling... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Feb 04, 2016 at 09:13:46PM +0300, Victor Wagner wrote: > On Thu, 4 Feb 2016 18:33:27 +0300 Michael Paquier> wrote: > > > Really, it is not so hard to add configure checks for perl modules. > > > And we need to test not only for IPC::Run, but for Test::More too, > > > because some Linux distributions put modules which come with perl > > > into separate package. > > > > The last time we discussed about that on this list we concluded that > > it was not really necessary to have such checks, for one it makes the > > code more simple, and because this is leveraged by the presence of > > --enable-tap-tests, tests which can get actually costly with > > check-world. But this is digressing the subject of this thread, which > > deals with the fact of having recovery tests integrated in core... > > Of course, such configure tests should be run only if > --enable-tap-tests is passed to the configure script > > It would look like > > if test "$enable_tap_tests" = "yes"; then > AX_PROG_PERL_MODULES( Test::More, , AC_MSG_ERROR([Test::More is > necessary to run TAP Tests]) > AX_PROG_PERL_MODULES( IPC::Run, , AC_MSG_ERROR([IPC::Run is > necessary to run TAP Tests]) > fi > > in the configure.in > > May be it is not strictly necessary, but it is really useful to see > such problems as clear error message during configure stage, rather > than successfully configure, compile, run tests and only then find out, > that something is forgotten. > > I don't see why having such tests in the configure.in, makes code more > complex. It just prevents configure to finish successfully if > --enable-tap-tests is specified and required modules are not available. Even if detecting missing modules at "configure" time is the right thing, it belongs in a distinct patch, discussed on a distinct thread. The absence of IPC::Run affects the proposed replication tests in the same way it affects current TAP suites, so this thread has no business revisiting it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcommitfest reply having corrupted subject line
On Thu, Feb 04, 2016 at 09:19:19AM +0100, Magnus Hagander wrote: > On Thu, Feb 4, 2016 at 7:26 AM, Noah Mischwrote: > > The following message, which bears "User-Agent: pgcommitfest", > > > > > > http://www.postgresql.org/message-id/flat/20160202164101.1291.30526.p...@coridan.postgresql.org > > > > added spaces after commas in its subject line, compared to the subject > > line of > > its In-Reply-To message. > > > > new subject line: Re: Add generate_series(date, date) and > > generate_series(date, date, integer) > > previous subject line: Re: Add generate_series(date,date) and > > generate_series(date,date,integer) > > > > I see no way to manually alter the subject line from the > > commitfest.postgresql.org review UI. Is commitfest.postgresql.org > > initiating > > this change internally? > > > > That is weird. > > The CF app certainly doesn't do that intentionally - it copies it from the > archives. And if I look in the db on the cf app, it has the subject without > the space as the field that it copied :O > > The code is line 355-358 at: > http://git.postgresql.org/gitweb/?p=pgcommitfest2.git;a=blob;f=pgcommitfest/commitfest/views.py;h=b4f19b2db470e5269943418b2402ca9ddfff0dff;hb=fec3b2431730c131a206a170a99a7610cdbacc6b#l355 > > the "subject" field in the db that we copy does not have the spaces... I > honestly have no idea where they are coming from :O I'm guessing it must be > something internally in the python libraries that create the MIME. So it is. The problem happens when email.mime.text.MIMEText wraps the subject line; see attached test program. > Have you > seen this with any other messages, that you can recall, or just this one? Just this one. However, no other hackers subject line since 2015-01-01 contains a comma followed by something other than a space. import email.mime.text msg = email.mime.text.MIMEText('body text', _charset='utf-8') msg['Subject'] = 'Re: generate_series(date,date)' print(msg.as_string()) msg = email.mime.text.MIMEText('body text', _charset='utf-8') msg['Subject'] = ('Re: ' ' ' ' ' 'generate_series(date,date) ' '') print(msg.as_string()) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNIQUE capability to hash indexes
Tom Lanewrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Not that I've heard of. It's very hard to muster any enthusiasm for > >> improving hash indexes unless their lack of WAL-logging is fixed first. > > > This is really strange though. Surely adding WAL-logging is not an > > enormous task anymore ... I mean, we're undertaking far larger efforts > > now, the WAL logging code is simpler than before, and we even have a > > tool (ok, gotta streamline that one a little bit) to verify that the > > results are correct. > > ISTR that we discussed this previously and ran into some stumbling block > or other that made it less-than-trivial. Don't recall what though. Concurrency of bucket split is the issue. It makes sense to fix the problem before anyone tries to implement WAL. Otherwise WAL will have to be reworked from scratch someday. I had some ideas which I even published: http://www.postgresql.org/message-id/32423.1427413442@localhost Then I spent some more time on it sometime in October and improved the concept a bit (and also found bugs in the version I had published, so please don't spend much time looking at it). I also wrote a function to check if the consistent (to be possibly added to pageinspect extension). I even wrote a function that inserts tuples only into index, not into heap - I suppose that should make comparison of index performance with and without the patch simpler. Now, besides making the patch easier to read, I need to test it thoroughly. The lack of time is one problem, but I need to admit that it's a personal issue too :-) So far I think I have a good idea, but now I should try hard to break it. Now that I see the problem mentioned again, I feel myself kind of "ignited". I expect to have some leisure time at the end of February, so I'll test the patch and post my results early in March. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On 12/29/15 10:38 AM, Bruce Momjian wrote: > On Thu, Dec 10, 2015 at 11:10:55AM -0500, Robert Haas wrote: >> On Wed, Dec 9, 2015 at 12:15 AM, Pavel Stehule>> wrote: On Wed, Dec 9, 2015 at 2:07 PM, Pavel Stehule wrote: > should be noted, recorded somewhere so this introduce possible > compatibility > issue - due default processing .psqlrc. That's written in the commit log, so I guess that's fine. >>> >>> ook >> >> Bruce uses the commit log to prepare the release notes, so I guess >> he'll make mention of this. > > Yes, I will certainly see it. I generally use the master branch psql for normal work, and this change has caused massive breakage for me. It's straightforward to fix, but in some cases the breakage is silent, for example if you do something=$(psql -c ...) and the .psqlrc processing causes additional output. I'm not sure what to make of it yet, but I want to mention it, because I fear there will be heartache. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2016-01 Commitfest
Hello Andres, I'm working on/off on the checkpointer flushing thing, so I don't think it makes sense for somebody else to take that over at this point. Yep. I still wish you could share your current working version? The last version sent is from November 11th, and I think someone said that it does not apply cleanly anymore on head. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On 4 February 2016 at 14:34, Fujii Masaowrote: > On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas wrote: >> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao wrote: >>> So you disagree with only third version that I proposed, i.e., >>> adding some hooks for sync replication? If yes and you're OK >>> with the first and second versions, ISTM that we almost reached >>> consensus on the direction of multiple sync replication feature. >>> The first version can cover "one local and one remote sync standbys" case, >>> and the second can cover "one local and at least one from several remote >>> standbys" case. I'm thinking to focus on the first version now, >>> and then we can work on the second to support the quorum commit >> >> Well, I think the only hard part of the third problem is deciding on >> what syntax to use. It seems like a waste of time to me to go to a >> bunch of trouble to implement #1 and #2 using one syntax and then have >> to invent a whole new syntax for #3. Seriously, this isn't that hard: >> it's not a technical problem. It's just that we've got a bunch of >> people who can't agree on what syntax to use. IMO, you should just >> pick something. You're presumably the committer for this patch, and I >> think you should just decide which of the 47,123 things proposed so >> far is best and insist on that. I trust that you will make a good >> decision even if it's different than the decision that I would have >> made. > > If we use one syntax for every cases, possible approaches that we can choose > are mini-language, json, etc. Since my previous proposal covers only very > simple cases, extra syntax needs to be supported for more complicated cases. > My plan was to add the hooks so that the developers can choose their own > syntax. But which might confuse users. > > Now I'm thinking that mini-language is better choice. A json has some good > points, but its big problem is that the setting value is likely to be very > long. > For example, when the master needs to wait for one local standby and > at least one from three remote standbys in London data center, the setting > value (synchronous_standby_names) would be > > s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1, > "nodes":["london1", "london2", "london3"]}]}' > > OTOH, the value with mini-language is simple and not so long as follows. > > s_s_names = '2[local1, 1(london1, london2, london3)]' > > This is why I'm now thinking that mini-language is better. But it's not easy > to completely implement mini-language. There seems to be many problems > that we need to resolve. For example, please imagine the case where > the master needs to wait for at least one from two standbys "tokyo1", "tokyo2" > in Tokyo data center. If Tokyo data center fails, the master needs to > wait for at least one from two standbys "london1", "london2" in London > data center, instead. This case can be configured as follows in mini-language. > > s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]' > > One problem here is; what pg_stat_replication.sync_state value should be > shown for each standbys? Which standby should be marked as sync? potential? > any other value like quorum? The current design of pg_stat_replication > doesn't fit complicated sync replication cases, so maybe we need to separate > it into several views. It's almost impossible to complete those problems. > > My current plan for 9.6 is to support the minimal subset of mini-language; > simple syntax of "[name, ...]". "" specifies the number of > sync standbys that the master needs to wait for. "[name, ...]" specifies > the priorities of the listed standbys. This first version supports neither > quorum commit nor nested sync replication configuration like > "[name, [name, ...]]". It just supports very simple > "1-level" configuration. Whatever the solution, I'm really don't like the idea of changing the definition of s_s_names based on the value of another GUC, mainly because it seems hacky, but also because the name of the GUC stops making sense. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle In Transaction Session Timeout, revived
On Sun, Jan 31, 2016 at 10:33 PM, Vik Fearingwrote: > Attached is a rebased and revised version of my > idle_in_transaction_session_timeout patch from last year. > > This version does not suffer the problems the old one did where it would > jump out of SSL code thanks to Andres' patch in commit > 4f85fde8eb860f263384fffdca660e16e77c7f76. > > The basic idea is if a session remains idle in a transaction for longer > than the configured time, that connection will be dropped thus releasing > the connection slot and any locks that may have been held by the broken > client. +1 But, IIRC, one of the problems that prevent the adoption of this feature is the addition of gettimeofday() call after every SQL command receipt. Have you already resolved that problem? Or we don't need to care about it because it's almost harmless? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Tue, Feb 2, 2016 at 3:59 AM, Thom Brownwrote: > public | pgbench_accounts_pkey | index | thom | pgbench_accounts | 214 MB | > public | pgbench_branches_pkey | index | thom | pgbench_branches | 24 kB | > public | pgbench_tellers_pkey | index | thom | pgbench_tellers | 48 kB | I see the same. I use my regular SQL query to see the breakdown of leaf/internal/root pages: postgres=# with tots as ( SELECT count(*) c, avg(live_items) avg_live_items, avg(dead_items) avg_dead_items, u.type, r.oid from (select c.oid, c.relpages, generate_series(1, c.relpages - 1) i from pg_index i join pg_opclass op on i.indclass[0] = op.oid join pg_am am on op.opcmethod = am.oid join pg_class c on i.indexrelid = c.oid where am.amname = 'btree') r, lateral (select * from bt_page_stats(r.oid::regclass::text, i)) u group by r.oid, type) select ct.relname table_name, tots.oid::regclass::text index_name, (select relpages - 1 from pg_class c where c.oid = tots.oid) non_meta_pages, upper(type) page_type, c npages, to_char(avg_live_items, '990.999'), to_char(avg_dead_items, '990.999'), to_char(c/sum(c) over(partition by tots.oid) * 100, '990.999') || ' %' as prop_of_index from tots join pg_index i on i.indexrelid = tots.oid join pg_class ct on ct.oid = i.indrelid where tots.oid = 'pgbench_accounts_pkey'::regclass order by ct.relnamespace, table_name, index_name, npages, type; table_name│ index_name │ non_meta_pages │ page_type │ npages │ to_char │ to_char │ prop_of_index ──┼───┼┼───┼┼──┼──┼─── pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ R │ 1 │ 97.000 │0.000 │0.004 % pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ I │ 97 │ 282.670 │0.000 │0.354 % pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ L │ 27,323 │ 366.992 │0.000 │ 99.643 % (3 rows) But this looks healthy -- I see the same with master. And since the accounts table is listed as 1281 MB, this looks like a plausible ratio in the size of the table to its primary index (which I would not say is true of an 87MB primary key index). Are you sure you have the details right, Thom? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle In Transaction Session Timeout, revived
David Steele wrote: > > <...> But what I think really happens is > > some badly-written Java application loses track of a connection > > someplace and just never finds it again. <...> I've seen that also, plenty of times. > That's what I've seen over and over again. And then sometimes it's not > a badly-written Java application, but me, and in that case I definitely > want the connection killed. Without logging, if you please. So the way to escape audit logging is to open a transaction, steal some data, then leave the connection open so that it's not logged when it's killed? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, 4 Feb 2016 12:59:03 +0300 Michael Paquierwrote: for it? > > > 1) Better to raise more meaningful error when IPC::Run is absent. > > This has been discussed before, and as far as I recall the current > behavior has been concluded as being fine. That's where > --enable-tap-tests becomes meaningful. Really, it is not so hard to add configure checks for perl modules. And we need to test not only for IPC::Run, but for Test::More too, because some Linux distributions put modules which come with perl into separate package. The only problem that most m4 files with tests for perl modules, which can be found in the Internet, have GPL license, so someone have either to write his own and publish under PostgreSQL license or contact author of one of them and ask to publish it under PostgreSQL license. First seems to be much easier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Wed, Feb 3, 2016 at 11:00 AM, Robert Haaswrote: > On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao wrote: >> So you disagree with only third version that I proposed, i.e., >> adding some hooks for sync replication? If yes and you're OK >> with the first and second versions, ISTM that we almost reached >> consensus on the direction of multiple sync replication feature. >> The first version can cover "one local and one remote sync standbys" case, >> and the second can cover "one local and at least one from several remote >> standbys" case. I'm thinking to focus on the first version now, >> and then we can work on the second to support the quorum commit > > Well, I think the only hard part of the third problem is deciding on > what syntax to use. It seems like a waste of time to me to go to a > bunch of trouble to implement #1 and #2 using one syntax and then have > to invent a whole new syntax for #3. Seriously, this isn't that hard: > it's not a technical problem. It's just that we've got a bunch of > people who can't agree on what syntax to use. IMO, you should just > pick something. You're presumably the committer for this patch, and I > think you should just decide which of the 47,123 things proposed so > far is best and insist on that. I trust that you will make a good > decision even if it's different than the decision that I would have > made. If we use one syntax for every cases, possible approaches that we can choose are mini-language, json, etc. Since my previous proposal covers only very simple cases, extra syntax needs to be supported for more complicated cases. My plan was to add the hooks so that the developers can choose their own syntax. But which might confuse users. Now I'm thinking that mini-language is better choice. A json has some good points, but its big problem is that the setting value is likely to be very long. For example, when the master needs to wait for one local standby and at least one from three remote standbys in London data center, the setting value (synchronous_standby_names) would be s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1, "nodes":["london1", "london2", "london3"]}]}' OTOH, the value with mini-language is simple and not so long as follows. s_s_names = '2[local1, 1(london1, london2, london3)]' This is why I'm now thinking that mini-language is better. But it's not easy to completely implement mini-language. There seems to be many problems that we need to resolve. For example, please imagine the case where the master needs to wait for at least one from two standbys "tokyo1", "tokyo2" in Tokyo data center. If Tokyo data center fails, the master needs to wait for at least one from two standbys "london1", "london2" in London data center, instead. This case can be configured as follows in mini-language. s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]' One problem here is; what pg_stat_replication.sync_state value should be shown for each standbys? Which standby should be marked as sync? potential? any other value like quorum? The current design of pg_stat_replication doesn't fit complicated sync replication cases, so maybe we need to separate it into several views. It's almost impossible to complete those problems. My current plan for 9.6 is to support the minimal subset of mini-language; simple syntax of "[name, ...]". "" specifies the number of sync standbys that the master needs to wait for. "[name, ...]" specifies the priorities of the listed standbys. This first version supports neither quorum commit nor nested sync replication configuration like "[name, [name, ...]]". It just supports very simple "1-level" configuration. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
On Thu, Feb 4, 2016 at 6:40 PM, Andres Freundwrote: > > On 2016-02-04 18:21:41 +0530, Amit Kapila wrote: > > I think generally it is good idea, but one thing worth a thought is that > > by doing so, we need to acquire all WAL Insertion locks every > > LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for > > every slot, do you think it is matter of concern in any way for write > > workloads or it won't effect as we need to do this periodically? > > Michael and I just had an in-person discussion, and one of the topics > was that. The plan was basically to adapt the patch to: > 1) Store the progress lsn inside the wal insert lock > 2) Change the HasActivity API to return an the last LSN at which there >was activity, instead of a boolean. > 2) Individually acquire each insert locks's lwlock to get it's progress >LSN, but not the exclusive insert lock. We need the lwllock to avoid >a torn 8byte read on some platforms. > > I think that mostly should address your concerns? > Yes, this sounds better and in-anycase we can do some benchmarks to verify the same once patch is in shape. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Idle In Transaction Session Timeout, revived
On 2/4/16 5:00 AM, Alvaro Herrera wrote: > David Steele wrote: > >>> <...> But what I think really happens is >>> some badly-written Java application loses track of a connection >>> someplace and just never finds it again. <...> > > I've seen that also, plenty of times. > >> That's what I've seen over and over again. And then sometimes it's not >> a badly-written Java application, but me, and in that case I definitely >> want the connection killed. Without logging, if you please. > > So the way to escape audit logging is to open a transaction, steal some > data, then leave the connection open so that it's not logged when it's > killed? Well, of course I was joking, but even so I only meant the disconnect shouldn't be logged to save me embarrassment. But you are probably joking as well. Oh, what a tangled web. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] pg_dump data structures for triggers
Vik Fearingwrites: > On 02/04/2016 01:44 AM, Tom Lane wrote: >> I'm looking into fixing the problem reported here: >> http://www.postgresql.org/message-id/1445a624-d09f-4b51-9c41-46ba1e2d6...@neveragain.de >> namely that if we split a view into a table + rule (because of circular >> dependencies), parallel pg_restore fails to ensure that it creates any >> triggers for the view only after creating the rule. If it tries to >> create the triggers first, the backend may barf because they're the wrong >> type of triggers for a plain table. > No objections to this, but my "better idea" is simply to allow INSTEAD > OF triggers on tables like discussed last year. > http://www.postgresql.org/message-id/14c6fe168a9-1012-10...@webprd-a87.mail.aol.com That sounds like a new feature, and not something we'd want to backpatch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Thu, Feb 4, 2016 at 11:55 AM, Ashutosh Bapatwrote: > A query with FOR UPDATE/SHARE will be considered parallel unsafe in > has_parallel_hazard_walker() and root->glob->parallelModeOK will be marked > false. This implies that none of the base relations and hence join relations > will be marked as consider_parallel. IIUC your logic, none of the queries > with FOR UPDATE/SHARE will get a local path which is marked parallel_safe > and thus join will not be pushed down. Why do you think we need to skip > paths that aren't parallel_safe? I have left aside this change in the latest > patches. I changed this back before committing but, ah nuts, you're right. Sigh. Sorry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2016/02/04 19:44, Rushabh Lathia wrote: I just started reviewing this and realized that patch is not getting applied cleanly on latest source, it having some conflicts. Can you please upload the correct version of patch. I rebased the patch to the latest HEAD. Attached is a rebased version of the patch. You don't need to apply the patch fdw-foreign-modify-rmh-v2.patch attached before. Thanks for the review! Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 1069,1074 deparseUpdateSql(StringInfo buf, PlannerInfo *root, --- 1069,1134 } /* + * deparse remote UPDATE statement + * + * The statement text is appended to buf, and we also create an integer List + * of the columns being retrieved by RETURNING (if any), which is returned + * to *retrieved_attrs. + */ + void + deparsePushedDownUpdateSql(StringInfo buf, PlannerInfo *root, + Index rtindex, Relation rel, + List *targetlist, + List *targetAttrs, + List *remote_conds, + List **params_list, + List *returningList, + List **retrieved_attrs) + { + RelOptInfo *baserel = root->simple_rel_array[rtindex]; + deparse_expr_cxt context; + int nestlevel; + bool first; + ListCell *lc; + + /* Set up context struct for recursion */ + context.root = root; + context.foreignrel = baserel; + context.buf = buf; + context.params_list = params_list; + + appendStringInfoString(buf, "UPDATE "); + deparseRelation(buf, rel); + appendStringInfoString(buf, " SET "); + + /* Make sure any constants in the exprs are printed portably */ + nestlevel = set_transmission_modes(); + + first = true; + foreach(lc, targetAttrs) + { + int attnum = lfirst_int(lc); + TargetEntry *tle = get_tle_by_resno(targetlist, attnum); + + if (!first) + appendStringInfoString(buf, ", "); + first = false; + + deparseColumnRef(buf, rtindex, attnum, root); + appendStringInfoString(buf, " = "); + deparseExpr((Expr *) tle->expr, ); + } + + reset_transmission_modes(nestlevel); + + if (remote_conds) + appendWhereClause(remote_conds, ); + + deparseReturningList(buf, root, rtindex, rel, false, + returningList, retrieved_attrs); + } + + /* * deparse remote DELETE statement * * The statement text is appended to buf, and we also create an integer List *** *** 1091,1096 deparseDeleteSql(StringInfo buf, PlannerInfo *root, --- 1151,1190 } /* + * deparse remote DELETE statement + * + * The statement text is appended to buf, and we also create an integer List + * of the columns being retrieved by RETURNING (if any), which is returned + * to *retrieved_attrs. + */ + void + deparsePushedDownDeleteSql(StringInfo buf, PlannerInfo *root, + Index rtindex, Relation rel, + List *remote_conds, + List **params_list, + List *returningList, + List **retrieved_attrs) + { + RelOptInfo *baserel = root->simple_rel_array[rtindex]; + deparse_expr_cxt context; + + /* Set up context struct for recursion */ + context.root = root; + context.foreignrel = baserel; + context.buf = buf; + context.params_list = params_list; + + appendStringInfoString(buf, "DELETE FROM "); + deparseRelation(buf, rel); + + if (remote_conds) + appendWhereClause(remote_conds, ); + + deparseReturningList(buf, root, rtindex, rel, false, + returningList, retrieved_attrs); + } + + /* * Add a RETURNING clause, if needed, to an INSERT/UPDATE/DELETE. */ static void *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 1314,1320 INSERT INTO ft2 (c1,c2,c3) --- 1314,1339 (3 rows) INSERT INTO ft2 (c1,c2,c3) VALUES (1104,204,'ddd'), (1105,205,'eee'); + EXPLAIN (verbose, costs off) + UPDATE ft2 SET c2 = c2 + 300, c3 = c3 || '_update3' WHERE c1 % 10 = 3; -- can be pushed down + QUERY PLAN + -- + Update on public.ft2 +-> Foreign Update on public.ft2 + Remote SQL: UPDATE "S 1"."T 1" SET c2 = (c2 + 300), c3 = (c3 || '_update3'::text) WHERE ((("C 1" % 10) = 3)) + (3 rows) + UPDATE ft2 SET c2 = c2 + 300, c3 = c3 || '_update3' WHERE c1 % 10 = 3; + EXPLAIN (verbose, costs off) + UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING *; -- can be pushed down + QUERY PLAN + --
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Fri, Feb 5, 2016 at 3:17 AM, Robert Haaswrote: > > On Thu, Feb 4, 2016 at 7:00 AM, Amit Kapila wrote: > > [ new patch ] > > I've committed this after heavy rewriting. In particular, I merged > two loops, used local variables to get rid of the very long and quite > repetitive lines, and did various cleanup on the documentation and > comments. > I think there is a typo in the committed code, patch to fix the same is attached. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com typo_named_extensions_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: Jim Nasby [mailto:jim.na...@bluetreble.com] > Sent: Friday, February 05, 2016 9:17 AM > To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; Robert Haas > Cc: Amit Langote > Subject: Re: [HACKERS] Way to check whether a particular block is on the > shared_buffer? > > On 2/4/16 12:30 AM, Kouhei Kaigai wrote: > >> 2. A feature to suspend i/o write-out towards a particular blocks > >> >that are registered by other concurrent backend, unless it is not > >> >unregistered (usually, at the end of P2P DMA). > >> >==> to be discussed. > > I think there's still a race condition here though... > > A > finds buffer not in shared buffers > > B > reads buffer in > modifies buffer > starts writing buffer to OS > > A > Makes call to block write, but write is already in process; thinks > writes are now blocked > Reads corrupted block > Much hilarity ensues > > Or maybe you were just glossing over that part for brevity. > > ... > > > I tried to design a draft of enhancement to realize the above i/o write-out > > suspend/resume, with less invasive way as possible as we can. > > > >ASSUMPTION: I intend to implement this feature as a part of extension, > >because this i/o suspend/resume checks are pure overhead increment > >for the core features, unless extension which utilizes it. > > > > Three functions shall be added: > > > > extern intGetStorageMgrNumbers(void); > > extern f_smgr GetStorageMgrHandlers(int smgr_which); > > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > > > As literal, GetStorageMgrNumbers() returns the number of storage manager > > currently installed. It always return 1 right now. > > GetStorageMgrHandlers() returns the currently configured f_smgr table to > > the supplied smgr_which. It allows extensions to know current configuration > > of the storage manager, even if other extension already modified it. > > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > > the current one. > > If extension wants to intermediate 'smgr_write', extension will replace > > the 'smgr_write' by own function, then call the original function, likely > > mdwrite, from the alternative function. > > > > In this case, call chain shall be: > > > >FlushBuffer, and others... > > +-- smgrwrite(...) > > +-- (extension's own function) > > +-- mdwrite > > ISTR someone (Robert Haas?) complaining that this method of hooks is > cumbersome to use and can be fragile if multiple hooks are being > installed. So maybe we don't want to extend it's usage... > > I'm also not sure whether this is better done with an smgr hook or a > hook into shared buffer handling... > # sorry, I oversight the later part of your reply. I can agree that smgr hooks shall be primarily designed to make storage systems pluggable, even if we can use this hooks for suspend & resume of write i/o stuff. In addition, "pluggable storage" is a long-standing feature, even though it is not certain whether existing smgr hooks are good starting point. It may be a risk if we implement a grand feature on top of the hooks but out of its primary purpose. So, my preference is a mechanism to hook buffer write to implement this feature. (Or, maybe a built-in write i/o suspend / resume stuff if it has nearly zero cost when no extension activate the feature.) One downside of this approach is larger number of hook points. We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc and FlushRelationBuffers, in addition to FlushBuffer, at least. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Tue, Feb 2, 2016 at 8:57 AM, Alvaro Herrerawrote: > Haribabu Kommi wrote: > >> Hi, Do you have any further comments on the patch that needs to be >> taken care? > > I do. I think the jsonb functions you added should be added to one of > the json .c files instead, since they seem of general applicability. moved these functions into jsonb_util.c file. > But actually, I don't think you have ever replied to the question of why > are you using JSON in the first place; isn't a normal array suitable? It was discussed and told to use JSON for options instead of array in [1], because of that reason I changed. > The callback stuff is not documented in check_hba() at all. Can you > please add an explanation just above the function, so that people trying > to use it know what can the callback be used for? Also a few lines > above the callback itself would be good. Added some details in explaining the call back function. >Also, the third argument of > check_hba() is a translatable message so you should enclose it in _() so > that it is picked up for translation. The "skipped"/"matched" words > (and maybe others?) need to be marked similarly. Changed mode column (skipped/matched) and reason for mismatch details are enclosed in _() for translation. Do you find any other details needs to be enclosed? > That "Failed" in the errmsg in pg_hba_lookup() should be lowercase. corrected. > Moving to next CF. Thanks. updated patch is attached with the above corrections. This patch needs to be applied on top discard_hba_and_ident_cxt patch that is posted earlier. [1] - http://www.postgresql.org/message-id/5547db0a.2020...@gmx.net Regards, Hari Babu Fujitsu Australia pg_hba_lookup_poc_v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)
On Thu, Jan 28, 2016 at 7:36 AM, Etsuro Fujitawrote: > Attached is that version of the patch. > > I think that postgres_fdw might be able to insert a tableoid value in the > returned slot in e.g., postgresExecForeignInsert if AFTER ROW Triggers or > RETURNING expressions reference that value, but I didn't do anything about > that. Thanks. I went with the earlier version, but split it into two commits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> On 2/4/16 12:30 AM, Kouhei Kaigai wrote: > >> 2. A feature to suspend i/o write-out towards a particular blocks > >> >that are registered by other concurrent backend, unless it is not > >> >unregistered (usually, at the end of P2P DMA). > >> >==> to be discussed. > > I think there's still a race condition here though... > > A > finds buffer not in shared buffers > > B > reads buffer in > modifies buffer > starts writing buffer to OS > > A > Makes call to block write, but write is already in process; thinks > writes are now blocked > Reads corrupted block > Much hilarity ensues > > Or maybe you were just glossing over that part for brevity. > Thanks, this part was not clear from my previous description. At the time when B starts writing buffer to OS, extension will catch this i/o request using a hook around the smgrwrite, then the mechanism registers the block to block P2P DMA request during B's write operation. (Of course, it unregisters the block at end of the smgrwrite) So, even if A wants to issue P2P DMA concurrently, it cannot register the block until B's write operation. In practical, this operation shall be "try lock", because B's write operation implies existence of the buffer in main memory, so B does not need to wait A's write operation if B switch DMA source from SSD to main memory. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei> ... > > > I tried to design a draft of enhancement to realize the above i/o write-out > > suspend/resume, with less invasive way as possible as we can. > > > >ASSUMPTION: I intend to implement this feature as a part of extension, > >because this i/o suspend/resume checks are pure overhead increment > >for the core features, unless extension which utilizes it. > > > > Three functions shall be added: > > > > extern intGetStorageMgrNumbers(void); > > extern f_smgr GetStorageMgrHandlers(int smgr_which); > > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > > > As literal, GetStorageMgrNumbers() returns the number of storage manager > > currently installed. It always return 1 right now. > > GetStorageMgrHandlers() returns the currently configured f_smgr table to > > the supplied smgr_which. It allows extensions to know current configuration > > of the storage manager, even if other extension already modified it. > > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > > the current one. > > If extension wants to intermediate 'smgr_write', extension will replace > > the 'smgr_write' by own function, then call the original function, likely > > mdwrite, from the alternative function. > > > > In this case, call chain shall be: > > > >FlushBuffer, and others... > > +-- smgrwrite(...) > > +-- (extension's own function) > > +-- mdwrite > > ISTR someone (Robert Haas?) complaining that this method of hooks is > cumbersome to use and can be fragile if multiple hooks are being > installed. So maybe we don't want to extend it's usage... > > I'm also not sure whether this is better done with an smgr hook or a > hook into shared buffer handling... > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] count_nulls(VARIADIC "any")
I wrote: > Pavel Stehulewrites: >> [ num_nulls_v6.patch ] > I started looking through this. It seems generally okay, but I'm not > very pleased with the function name "num_notnulls". I think it would > be better as "num_nonnulls", as I see Oleksandr suggested already. Not hearing any complaints, I pushed it with that change and some other cosmetic adjustments. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] count_nulls(VARIADIC "any")
Dne 5. 2. 2016 1:33 napsal uživatel "Tom Lane": > > Pavel Stehule writes: > > [ num_nulls_v6.patch ] > > I started looking through this. It seems generally okay, but I'm not > very pleased with the function name "num_notnulls". I think it would > be better as "num_nonnulls", as I see Oleksandr suggested already. I have no problem with it. Regards Pavel > > regards, tom lane
Re: [HACKERS] Incorrect formula for SysV IPC parameters
At Thu, 4 Feb 2016 21:43:04 +0900, Fujii Masaowrote in > On Wed, Feb 3, 2016 at 12:51 PM, Kyotaro HORIGUCHI > wrote: > > Hello, I found that the formulas to calculate SEMMNI and SEMMNS > > are incorrect in 9.2 and later. > > > > http://www.postgresql.org/docs/9.5/static/kernel-resources.html > > > > But actually the number of semaphores PostgreSQL needs is > > calculated as following in 9.4 and later. ... > > So, the formula for SEMMNI should be > > > > ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) > > / 16) > > > > and SEMMNS should have the same fix. > > > > > > In 9.3 and 9.2, the documentation says the same thing but ... > > ceil((max_connections + autovacuum_max_workers + 5) / 16) > > > > In 9.1, NUM_AUXILIARY_PROCS is 3 so the documentations is correct. > > Good catch! Thanks. > ISTM that you also need to change the descriptions about SEMMNI and SEMMNS > under the Table 17-1. Oops! Thank you for pointing it out. The original description doesn't mention the magic-number ('5' in the above formulas, which previously was '4') so I haven't added anything about it. Process of which the number is limited by max_worker_processes is called 'background process' (not 'backend worker') in the documentation so I used the name to mention it in the additional description. The difference in the body text for 9.2, 9.3 is only a literal '4' to '5' in the formula. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 6ed2f296cc5899f75a2e817f470e5da07bcb0d2c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 3 Feb 2016 11:35:43 +0900 Subject: [PATCH] Fix the formula to calculate SEMMNI and SEMMNS in documentation --- doc/src/sgml/runtime.sgml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 0db3807..2bc0a91 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -645,13 +645,13 @@ psql: could not connect to server: No such file or directory SEMMNI Maximum number of semaphore identifiers (i.e., sets) -at least ceil((max_connections + autovacuum_max_workers + 4) / 16) +at least ceil((max_connections + autovacuum_max_workers + 5) / 16) SEMMNS Maximum number of semaphores system-wide -ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications +ceil((max_connections + autovacuum_max_workers + 5) / 16) * 17 plus room for other applications @@ -712,7 +712,7 @@ psql: could not connect to server: No such file or directory linkend="sysvipc-parameters">). The parameter SEMMNI determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at -least ceil((max_connections + autovacuum_max_workers + 4) / 16). +least ceil((max_connections + autovacuum_max_workers + 5) / 16). Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded No space -- 1.8.3.1 >From c196c82665e4081a6f90938ac86bb63e4e60e45c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 3 Feb 2016 11:07:12 +0900 Subject: [PATCH] Fix the formula to calculate SEMMNI and SEMMNS in documentation --- doc/src/sgml/runtime.sgml | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index cda05f5..dd42dd0 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -645,13 +645,13 @@ psql: could not connect to server: No such file or directory SEMMNI Maximum number of semaphore identifiers (i.e., sets) -at least ceil((max_connections + autovacuum_max_workers + 4) / 16) +at least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) SEMMNS Maximum number of semaphores system-wide -ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications +ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) * 17 plus room for other applications @@ -699,20 +699,22 @@ psql: could not connect to server: No such file or directory PostgreSQL uses one semaphore per allowed connection -() and allowed autovacuum worker -process (), in sets of 16. +(), allowed autovacuum worker process +() and allowed background +process (), in sets of 16. Each such set will also contain a 17th semaphore which contains a magic number, to detect collision with semaphore sets used by other
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 9:34 AM, Fujii Masaowrote: > Now I'm thinking that mini-language is better choice. A json has some good > points, but its big problem is that the setting value is likely to be very > long. > For example, when the master needs to wait for one local standby and > at least one from three remote standbys in London data center, the setting > value (synchronous_standby_names) would be > > s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1, > "nodes":["london1", "london2", "london3"]}]}' > > OTOH, the value with mini-language is simple and not so long as follows. > > s_s_names = '2[local1, 1(london1, london2, london3)]' Yeah, that was my thought also. Another idea which was suggested is to create a completely new configuration file for this. Most people would only have simple stuff in there, of course, but then you could have the information spread across multiple lines. I don't in the end care very much about how we solve this problem. But I'm glad you agree that whatever we do to solve the simple problem should be a logical subset of what the full solution will eventually look like, not a completely different design. I think that's important. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Thu, Feb 4, 2016 at 8:25 AM, Thom Brownwrote: > > No, I'm not. I've just realised that all I've been checking is the > primary key expecting it to change in size, which is, of course, > nonsense. I should have been creating an index on the bid field of > pgbench_accounts and reviewing the size of that. Right. Because, apart from everything else, unique indexes are not currently supported. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
Hi I tested last version, v11 and I have not any objection It is working as expected all regress tests passed, there is related documentation and new test is attached. This patch is ready form commiter. Daniel, thank you very much, it is interesting feature. Regards Pavel
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Hello, I'm studying this. Two hunks in 0003 needed a fix but the other part applied cleanly on master. At Fri, 22 Jan 2016 15:17:51 +0900, Michael Paquierwrote in
Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
On Sun, Jan 31, 2016 at 7:59 PM, Andreas Karlssonwrote: > Nice work, I like your sorting patches. Thanks. I like your reviews of my sorting patches. :-) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: PL/Pythonu - function ereport
On Tue, Feb 2, 2016 at 11:52 PM, Alvaro Herrerawrote: > Robert Haas wrote: >> The eventual committer is likely to be much happier with this patch if >> you guys have achieved consensus among yourselves on the best >> approach. > > Actually I imagine that if there's no agreement between author and first > reviewer, there might not *be* a committer in the first place. Perhaps > try to get someone else to think about it and make a decision. It is > possible that some other committer is able to decide by themselves but I > wouldn't count on it. Pavel and I agree that the backward incompatible behavior is cleaner, but it's backward incompatible. Whether that extra cleanness is worth the incompatibility is never obvious. In this case I think it does. But since my opinion is just my opinion, I was planning to make the committer be that someone else that weighs the issues and makes a decision. I'm new around here and picked this patch to get started due to having Python knowledge and the patch seeming self contained enough. Being new makes me wonder all the time "am I just making life difficult for the patch author or is this idea genuinely good and therefore I should push it forward?". I think more beginners that try to do reviews struggle with this. But, let's try to reach a decision. The existing behaviour dates back to 0bef7ba Add plpython code by Bruce. I've added him to the mail, maybe he can help us with a decision. I'll summarize the patch and explain the incompatibility decision with some illustrative code. The patch is trying to make it possible to call ereport from PL/Python and provide the rich ereport information (detail, hint, sqlcode etc.). There are already functions like plpy.info() but they only expose message and they call elog instead of ereport. See the attached incompat.py. Running it produces: existing behaviour PG elog/ereport with message: 1: hi detail: None hint: None PG elog/ereport with message: ('2: hi', 'another argument') detail: None hint: None PG elog/ereport with message: ('3: hi', 'another argument', 2) detail: None hint: None PG elog/ereport with message: ('4: hi', 'another argument', 2, 'lots', 'of', 'arguments') detail: None hint: None new behaviour PG elog/ereport with message: 1: hi detail: None hint: None PG elog/ereport with message: 2: hi detail: another argument hint: None PG elog/ereport with message: 3: hi detail: another argument hint: 2 Traceback (most recent call last): File "incompat.py", line 43, in info_new('4: hi', 'another argument', 2, 'lots', 'of', 'arguments') TypeError: info_new() takes at most 3 arguments (6 given) I find existing behaviour for 2, 3 and 4 unlike other Python APIs I've seen, surprising and not very useful. If I want to log a tuple I can construct and pass a single tuple, I don't see why plpy.info() needs to construct it for me. And for the documented, single argument call nothing changes. The question to Bruce (and others) is: is it ok to change to the new behaviour illustrated and change meaning for usages like 2, 3 and 4? If we don't want that, the solution Pavel and I see is introducing a parallel API named plpy.raise_info or plpy.rich_info or something like that with the new behaviour and leave the existing functions unchanged. Another option is some compatibility GUC but I don't think it's worth the trouble and confusion. # simulated PG elog def elog(message): ereport(message) # simulated PG ereport def ereport(message, detail=None, hint=None): print 'PG elog/ereport with message: %s detail: %s hint: %s' % (message, detail, hint) # existing code behaves like this: # takes an unlimited number of arguments # doesn't take keyword arguments # makes a string representation of the tuple of all arguments unless that tuple has size 1 in which case it only makes a string representation of the first one def info_existing(*args): if len(args) == 1: # special case added by Peter in 2e3b16 str_to_log = str(args[0]) else: str_to_log = str(args) elog(str_to_log) # and I'm proposing to change it to do this: # take 1 required argument and extra optional arguments for every argument accepted by ereport # accepts keyword arguments # passing too many arguments will get rejected def info_new(message, detail=None, hint=None): ereport(message, detail, hint) print 'existing behaviour'.center(40) info_existing('1: hi') info_existing('2: hi', 'another argument') info_existing('3: hi', 'another argument', 2) info_existing('4: hi', 'another argument', 2, 'lots', 'of', 'arguments') print print 'new behaviour'.center(40) info_new('1: hi') # for the documented single argument case same behaviour as existing info_new('2: hi', 'another argument') info_new('3: hi', 'another argument', 2) info_new('4: hi', 'another argument', 2, 'lots', 'of', 'arguments') -- Sent via pgsql-hackers
[HACKERS] Comment typo in slot.c
Hi all, I just bumped into the following typo in slot.c: /* * If we'd now fail - really unlikely - we wouldn't know whether this slot * would persist after an OS crash or not - so, force a restart. The -* restart would try to fysnc this again till it works. +* restart would try to fsync this again till it works. */ START_CRIT_SECTION(); Regards, -- Michael diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 251b549..11b44a4 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -984,7 +984,7 @@ CreateSlotOnDisk(ReplicationSlot *slot) /* * If we'd now fail - really unlikely - we wouldn't know whether this slot * would persist after an OS crash or not - so, force a restart. The - * restart would try to fysnc this again till it works. + * restart would try to fsync this again till it works. */ START_CRIT_SECTION(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Feb 4, 2016 at 9:18 PM, Victor Wagner wrote: > It's quite good that patch sets standard of using 'use strict; use > warnings;' in the test script. FWIW, this is decided as being a standard rule for any modules/script added in the main tree. > It is bad, that Postgres-specific perl modules do not have embedded > documentation. It would be nice to see POD documentation in the > TestLib.pm and PostgresNode.pm instead of just comments. It would be > much easier to test writers to read documentation using perldoc utility, > rather than browse through the code. Why not. I am no perlist but those prove to be helpful, however those Postgres modules are not dedicated to a large audience, so we could live without for now. > I think that this patch should be commited as soon as possible in its > current form (short of already reported reject in the PostgresNode.pm > init function). Thanks for your enthusiasm. Now, to do an auto-critic of my patch: + if ($params{allows_streaming}) + { + print $conf "wal_level = hot_standby\n"; + print $conf "max_wal_senders = 5\n"; + print $conf "wal_keep_segments = 20\n"; + print $conf "max_wal_size = 128MB\n"; + print $conf "shared_buffers = 1MB\n"; + print $conf "wal_log_hints = on\n"; + print $conf "hot_standby = on\n"; + } This could have more thoughts, particularly for wal_log_hints which is not used all the time, I think that we'd actually want to complete that with an optional hash of parameter/values that get appended at the end of the configuration file, then pass wal_log_hints in the tests where it is needed. The default set of parameter is maybe fine if done this way, still wal_keep_segments could be removed. +# Tets for timeline switch +# Encure that a standby is able to follow a newly-promoted standby Two typos in two lines. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 7:27 PM, Robert Haaswrote: > I don't in the end care very much about how we solve this problem. > But I'm glad you agree that whatever we do to solve the simple problem > should be a logical subset of what the full solution will eventually > look like, not a completely different design. I think that's > important. Yes, please let's use the custom language, and let's not care of not more than 1 level of nesting so as it is possible to represent pg_stat_replication in a simple way for the user. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquierwrote: > Yes, please let's use the custom language, and let's not care of not > more than 1 level of nesting so as it is possible to represent > pg_stat_replication in a simple way for the user. "not" is used twice in this sentence in a way that renders me not able to be sure that I'm not understanding it not properly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Thu, Feb 4, 2016 at 4:30 AM, Robert Haaswrote: > On Wed, Feb 3, 2016 at 5:56 PM, Robert Haas wrote: > > On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat > > wrote: > >> PFA patches with naming conventions similar to previous ones. > >> pg_fdw_core_v7.patch: core changes > >> pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown > >> pg_join_pd_v7.patch: combined patch for ease of testing. > > > > Hmm, I think that GetPathForEPQRecheck() is a pretty terrible name. > > How about GetExistingJoinPath()? > GetExistingLocalJoinPath()? Used that. > > Oops. Hit Send too soon. Also, how about writing if > (path->param_info != NULL) continue; instead of burying the core of > the function in another level of indentation? Hmm. Done. > I think you should skip > paths that aren't parallel_safe, too, and the documentation should be > clear that this will find an unparameterized, parallel-safe joinpath > if one exists. > A query with FOR UPDATE/SHARE will be considered parallel unsafe in has_parallel_hazard_walker() and root->glob->parallelModeOK will be marked false. This implies that none of the base relations and hence join relations will be marked as consider_parallel. IIUC your logic, none of the queries with FOR UPDATE/SHARE will get a local path which is marked parallel_safe and thus join will not be pushed down. Why do you think we need to skip paths that aren't parallel_safe? I have left aside this change in the latest patches. > > + ForeignPath *foreign_path; > + foreign_path = (ForeignPath > *)joinpath->outerjoinpath; > > Maybe insert a blank line between here, and in the other, similar case. > Done. Patches attached with the previous mail. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
On 2016-02-04 18:21:41 +0530, Amit Kapila wrote: > I think generally it is good idea, but one thing worth a thought is that > by doing so, we need to acquire all WAL Insertion locks every > LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for > every slot, do you think it is matter of concern in any way for write > workloads or it won't effect as we need to do this periodically? Michael and I just had an in-person discussion, and one of the topics was that. The plan was basically to adapt the patch to: 1) Store the progress lsn inside the wal insert lock 2) Change the HasActivity API to return an the last LSN at which there was activity, instead of a boolean. 2) Individually acquire each insert locks's lwlock to get it's progress LSN, but not the exclusive insert lock. We need the lwllock to avoid a torn 8byte read on some platforms. I think that mostly should address your concerns? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle In Transaction Session Timeout, revived
On 02/04/2016 02:24 PM, Fujii Masao wrote: > On Sun, Jan 31, 2016 at 10:33 PM, Vik Fearingwrote: >> Attached is a rebased and revised version of my >> idle_in_transaction_session_timeout patch from last year. >> >> This version does not suffer the problems the old one did where it would >> jump out of SSL code thanks to Andres' patch in commit >> 4f85fde8eb860f263384fffdca660e16e77c7f76. >> >> The basic idea is if a session remains idle in a transaction for longer >> than the configured time, that connection will be dropped thus releasing >> the connection slot and any locks that may have been held by the broken >> client. > > +1 > > But, IIRC, one of the problems that prevent the adoption of this feature is > the addition of gettimeofday() call after every SQL command receipt. > Have you already resolved that problem? Or we don't need to care about > it because it's almost harmless? I guess it would be possible to look at MyBEEntry somehow and pull st_state_start_timestamp from it to replace the call to GetCurrentTimestamp(), but I don't know if it's worth doing that. The extra call only happens if the timeout is enabled anyway, so I don't think it matters enough to be a blocker. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench stats per script & other stuff
Hello Alvaro, Something is wrong with patch d. I noticed two things, 1. the total_weight stuff can overflow, It can generate an error on overflow by checking the total_weight while it is being computed. I've switched total_weight to int64 so it is now really impossible to overflow with the 32 bit int_max limit on weight. 2. the chooseScript stuff is broken, or something. Sorry, probably a <=/< error. I think I've fixed it and I've simplified the code a little bit. Another thing is that the "transaction type" output really deserves some more work. I think "multiple scripts" really doesn't cut it; we should have some YAML-like as in the latency reports, which lists all scripts in use and their weights. For me the current output is clear for the reader, which does not mean that it cannot be improve, but how is rather a matter of taste. I've tried to improve it further, see attached patch. If you want something else, it would help to provide a sample of what you expect. Also, while I have your attention regarding accumulated "technical debt", please have a look at the "desc" argument used in addScript etc. It's pretty ridiculous currently. Maybe findBuiltin / process_builtin / process_file should return a struct containing Command ** and the "desc" string, rather than passing desc as a separate argument. Ok, it can return a pointer to the builtin script. Attached is my version of the patch. While you're messing with it, it'd be nice if you added comments on top of your recently added functions such as findBuiltin, process_builtin, chooseScript. Why not. Find attached a 18-d which addresses these concerns, and a actualized 18-e for the prefix. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ade1b53..780a520 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -262,11 +262,13 @@ pgbench options dbname - -b scriptname - --builtin scriptname + -b scriptname[@weight] + --builtin=scriptname[@weight] Add the specified builtin script to the list of executed scripts. +An optional integer weight after @ allows to adjust the +probability of drawing the script. If not specified, it is set to 1. Available builtin scripts are: tpcb-like, simple-update and select-only. With special name list, show the list of builtin scripts @@ -321,12 +323,14 @@ pgbench options dbname - -f filename - --file=filename + -f filename[@weight] + --file=filename[@weight] Add a transaction script read from filename to the list of executed scripts. +An optional integer weight after @ allows to adjust the +probability of drawing the test. See below for details. @@ -686,9 +690,13 @@ pgbench options dbname What is the Transaction Actually Performed in pgbench? - Pgbench executes test scripts chosen randomly from a specified list. + pgbench executes test scripts chosen randomly + from a specified list. They include built-in scripts with -b and user-provided custom scripts with -f. + Each script may be given a relative weight specified after a + @ so as to change its drawing probability. + The default weight is 1. @@ -1135,12 +1143,11 @@ number of clients: 10 number of threads: 1 number of transactions per client: 1000 number of transactions actually processed: 1/1 +latency average = 15.844 ms +latency stddev = 2.715 ms tps = 618.764555 (including connections establishing) tps = 622.977698 (excluding connections establishing) -SQL script 1: builtin: TPC-B (sort of) - - 1 transactions (100.0% of total, tps = 618.764555) - - latency average = 15.844 ms - - latency stddev = 2.715 ms +script statistics: - statement latencies in milliseconds: 0.004386\set nbranches 1 * :scale 0.001343\set ntellers 10 * :scale diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 7eb6a2d..da4f05c 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -38,6 +38,7 @@ #include "portability/instr_time.h" #include +#include #include #include #include @@ -179,6 +180,8 @@ char *login = NULL; char *dbName; const char *progname; +#define WSEP '@'/* weight separator */ + volatile bool timer_exceeded = false; /* flag from signal handler */ /* variable definitions */ @@ -299,23 +302,26 @@ typedef struct static struct { const char *name; + int weight; Command **commands; - StatsData stats; + StatsData stats; } sql_script[MAX_SCRIPTS]; /* SQL script files */ static int num_scripts; /* number of scripts in sql_script[] */ static int num_commands = 0; /* total number of Command structs */ +static int64 total_weight = 0; + static int debug = 0; /* debug flag */ /* Define builtin test
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Feb 4, 2016 at 4:43 PM, Victor Wagnerwrote: > On Thu, 4 Feb 2016 12:59:03 +0300 > Michael Paquier wrote: >> > 1) Better to raise more meaningful error when IPC::Run is absent. >> >> This has been discussed before, and as far as I recall the current >> behavior has been concluded as being fine. That's where >> --enable-tap-tests becomes meaningful. > > Really, it is not so hard to add configure checks for perl modules. > And we need to test not only for IPC::Run, but for Test::More too, > because some Linux distributions put modules which come with perl into > separate package. The last time we discussed about that on this list we concluded that it was not really necessary to have such checks, for one it makes the code more simple, and because this is leveraged by the presence of --enable-tap-tests, tests which can get actually costly with check-world. But this is digressing the subject of this thread, which deals with the fact of having recovery tests integrated in core... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
On Thu, Feb 4, 2016 at 4:10 PM, Andres Freundwrote: > On 2016-02-04 18:21:41 +0530, Amit Kapila wrote: >> I think generally it is good idea, but one thing worth a thought is that >> by doing so, we need to acquire all WAL Insertion locks every >> LOG_SNAPSHOT_INTERVAL_MS to check the last_insert_pos for >> every slot, do you think it is matter of concern in any way for write >> workloads or it won't effect as we need to do this periodically? > > Michael and I just had an in-person discussion, and one of the topics > was that. The plan was basically to adapt the patch to: > 1) Store the progress lsn inside the wal insert lock > 2) Change the HasActivity API to return an the last LSN at which there >was activity, instead of a boolean. > 3) Individually acquire each insert locks's lwlock to get it's progress >LSN, but not the exclusive insert lock. We need the lwllock to avoid >a torn 8byte read on some platforms. 4) Switch the condition to decide if a checkpoint should be skipped using the last activity position compared with ProcLastRecPtr in CreateCheckpoint to see if any activity has occurred since the checkpoint record was inserted, and do not care anymore if the previous record and current record are on different segments. This would basically work. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing - V16
Hi, Fabien asked me to post a new version of the checkpoint flushing patch series. While this isn't entirely ready for commit, I think we're getting closer. I don't want to post a full series right now, but my working state is available on http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush The main changes are that: 1) the significant performance regressions I saw are addressed by changing the wal writer flushing logic 2) The flushing API moved up a couple layers, and now deals with buffer tags, rather than the physical files 3) Writes from checkpoints, bgwriter and files are flushed, configurable by individual GUCs. Without that I still saw the spiked in a lot of circumstances. There's also a more experimental reimplementation of bgwriter, but I'm not sure it's realistic to polish that up within the constraints of 9.6. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On 4 February 2016 at 15:07, Peter Geogheganwrote: > On Tue, Feb 2, 2016 at 3:59 AM, Thom Brown wrote: >> public | pgbench_accounts_pkey | index | thom | pgbench_accounts | 214 MB | >> public | pgbench_branches_pkey | index | thom | pgbench_branches | 24 kB | >> public | pgbench_tellers_pkey | index | thom | pgbench_tellers | 48 kB | > > I see the same. > > I use my regular SQL query to see the breakdown of leaf/internal/root pages: > > postgres=# with tots as ( > SELECT count(*) c, > avg(live_items) avg_live_items, > avg(dead_items) avg_dead_items, > u.type, > r.oid > from (select c.oid, > c.relpages, > generate_series(1, c.relpages - 1) i > from pg_index i > join pg_opclass op on i.indclass[0] = op.oid > join pg_am am on op.opcmethod = am.oid > join pg_class c on i.indexrelid = c.oid > where am.amname = 'btree') r, > lateral (select * from bt_page_stats(r.oid::regclass::text, i)) u > group by r.oid, type) > select ct.relname table_name, > tots.oid::regclass::text index_name, > (select relpages - 1 from pg_class c where c.oid = tots.oid) non_meta_pages, > upper(type) page_type, > c npages, > to_char(avg_live_items, '990.999'), > to_char(avg_dead_items, '990.999'), > to_char(c/sum(c) over(partition by tots.oid) * 100, '990.999') || ' > %' as prop_of_index > from tots > join pg_index i on i.indexrelid = tots.oid > join pg_class ct on ct.oid = i.indrelid > where tots.oid = 'pgbench_accounts_pkey'::regclass > order by ct.relnamespace, table_name, index_name, npages, type; > table_name│ index_name │ non_meta_pages │ page_type > │ npages │ to_char │ to_char │ prop_of_index > ──┼───┼┼───┼┼──┼──┼─── > pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ R > │ 1 │ 97.000 │0.000 │0.004 % > pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ I > │ 97 │ 282.670 │0.000 │0.354 % > pgbench_accounts │ pgbench_accounts_pkey │ 27,421 │ L > │ 27,323 │ 366.992 │0.000 │ 99.643 % > (3 rows) > > But this looks healthy -- I see the same with master. And since the > accounts table is listed as 1281 MB, this looks like a plausible ratio > in the size of the table to its primary index (which I would not say > is true of an 87MB primary key index). > > Are you sure you have the details right, Thom? *facepalm* No, I'm not. I've just realised that all I've been checking is the primary key expecting it to change in size, which is, of course, nonsense. I should have been creating an index on the bid field of pgbench_accounts and reviewing the size of that. Now I've checked it with the latest patch, and can see it working fine. Apologies for the confusion. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikovawrote: > I fixed it in the new version (attached). Some quick remarks on your V2.0: * Seems unnecessary that _bt_binsrch() is passed a real pointer by all callers. Maybe the one current posting list caller _bt_findinsertloc(), or its caller, _bt_doinsert(), should do this work itself: @@ -373,7 +377,17 @@ _bt_binsrch(Relation rel, * scan key), which could be the last slot + 1. */ if (P_ISLEAF(opaque)) + { + if (low <= PageGetMaxOffsetNumber(page)) + { + IndexTuple oitup = (IndexTuple) PageGetItem(page, PageGetItemId(page, low)); + /* one excessive check of equality. for possible posting tuple update or creation */ + if ((_bt_compare(rel, keysz, scankey, page, low) == 0) + && (IndexTupleSize(oitup) + sizeof(ItemPointerData) < BTMaxItemSize(page))) + *updposing = true; + } return low; + } * ISTM that you should not use _bt_compare() above, in any case. Consider this: postgres=# select 5.0 = 5.000; ?column? ── t (1 row) B-Tree operator class indicates equality here. And yet, users will expect to see the original value in an index-only scan, including the trailing zeroes as they were originally input. So this should be a bit closer to HeapSatisfiesHOTandKeyUpdate() (actually, heap_tuple_attr_equals()), which looks for strict binary equality for similar reasons. * Is this correct?: @@ -555,7 +662,9 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup) * it off the old page, not the new one, in case we are not at leaf * level. */ - state->btps_minkey = CopyIndexTuple(oitup); + ItemId iihk = PageGetItemId(opage, P_HIKEY); + IndexTuple hikey = (IndexTuple) PageGetItem(opage, iihk); + state->btps_minkey = CopyIndexTuple(hikey); How this code has changed from the master branch is not clear to me. I understand that this code in incomplete/draft: +#define MaxPackedIndexTuplesPerPage\ + ((int) ((BLCKSZ - SizeOfPageHeaderData) / \ + (sizeof(ItemPointerData But why is it different to the old (actually unchanged) MaxIndexTuplesPerPage? I would like to see comments explaining your understanding, even if they are quite rough. Why did GIN never require this change to a generic header (itup.h)? Should such a change live in that generic header file, and not another one more localized to nbtree? * More explanation of the design would be nice. I suggest modifying the nbtree README file, so it's easy to tell what the current design is. It's hard to follow this from the thread. When I reviewed Heikki's B-Tree patches from a couple of years ago, we spent ~75% of the time on design, and only ~25% on code. * I have a paranoid feeling that the deletion locking protocol (VACUUMing index tuples concurrently and safely) may need special consideration here. Basically, with the B-Tree code, there are several complicated locking protocols, like for page splits, page deletion, and interlocking with vacuum ("super exclusive lock" stuff). These are why the B-Tree code is complicated in general, and it's very important to pin down exactly how we deal with each. Ideally, you'd have an explanation for why your code was correct in each of these existing cases (especially deletion). With very complicated and important code like this, it's often wise to be very clear about when we are talking about your design, and when we are talking about your code. It's generally too hard to review both at the same time. Ideally, when you talk about your design, you'll be able to say things like "it's clear that this existing thing is correct; at least we have no complaints from the field. Therefore, it must be true that my new technique is also correct, because it makes that general situation no worse". Obviously that kind of rigor is just something we aspire to, and still fall short of at times. Still, it would be nice to specifically see a reason why the new code isn't special from the point of view of the super-exclusive lock thing (which is what I mean by deletion locking protocol + special consideration). Or why it is special, but that's okay, or whatever. This style of review is normal when writing B-Tree code. Some other things don't need this rigor, or have no invariants that need to be respected/used. Maybe this is obvious to you already, but it isn't obvious to me. It's okay if you don't know why, but knowing that you don't have a strong opinion about something is itself useful information. * I see you disabled the LP_DEAD thing; why? Just because that made bugs go away? * Have you done much stress testing? Using pgbench with many concurrent VACUUM FREEZE operations would be a good idea, if you haven't already, because that is insistent about getting super exclusive locks, unlike regular VACUUM. * Are you keeping the restriction of 1/3 of a buffer
Re: [HACKERS] UNIQUE capability to hash indexes
Shubham Baraiwrites: > I wanted to know if anyone is working on these projects from todo list. > 1.Add UNIQUE capability to hash indexes > 2.Allow multi-column hash indexes Not that I've heard of. It's very hard to muster any enthusiasm for improving hash indexes unless their lack of WAL-logging is fixed first. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNIQUE capability to hash indexes
On 2/4/16 1:12 PM, Tom Lane wrote: > Shubham Baraiwrites: >> I wanted to know if anyone is working on these projects from todo list. >> 1.Add UNIQUE capability to hash indexes >> 2.Allow multi-column hash indexes > > Not that I've heard of. It's very hard to muster any enthusiasm for > improving hash indexes unless their lack of WAL-logging is fixed first. +1. I currently steer customers away from hash indexes and have never used them for any project myself. They are simply not safe. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
[HACKERS] UNIQUE capability to hash indexes
Hello hackers , I wanted to know if anyone is working on these projects from todo list. 1.Add UNIQUE capability to hash indexes 2.Allow multi-column hash indexes I couldn't find any discussion about these projects. Thanks, Shubham Barai
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, 4 Feb 2016 18:33:27 +0300 Michael Paquierwrote: > > Really, it is not so hard to add configure checks for perl modules. > > And we need to test not only for IPC::Run, but for Test::More too, > > because some Linux distributions put modules which come with perl > > into separate package. > > The last time we discussed about that on this list we concluded that > it was not really necessary to have such checks, for one it makes the > code more simple, and because this is leveraged by the presence of > --enable-tap-tests, tests which can get actually costly with > check-world. But this is digressing the subject of this thread, which > deals with the fact of having recovery tests integrated in core... Of course, such configure tests should be run only if --enable-tap-tests is passed to the configure script It would look like if test "$enable_tap_tests" = "yes"; then AX_PROG_PERL_MODULES( Test::More, , AC_MSG_ERROR([Test::More is necessary to run TAP Tests]) AX_PROG_PERL_MODULES( IPC::Run, , AC_MSG_ERROR([IPC::Run is necessary to run TAP Tests]) fi in the configure.in May be it is not strictly necessary, but it is really useful to see such problems as clear error message during configure stage, rather than successfully configure, compile, run tests and only then find out, that something is forgotten. I don't see why having such tests in the configure.in, makes code more complex. It just prevents configure to finish successfully if --enable-tap-tests is specified and required modules are not available. -- Victor Wagner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
This patch adds a long-awaited functionality to the PostgreSQL test suite - testing of cluster configuration. It contains bare minimum of replication and recovery test, but it should be a good starting point for other people. Really, adding a much more tests for replication and recovery is problematic, because these tests are resource-hungry, and take big enough time even on relatively powerful machines, but it seems to be necessary, because they need to create several temporary installation. So, set of tests, included into this patch is reasonably good choice. I think that readability of tests can be improved a bit, because these tests would serve as an example for all tap test writers. It's quite good that patch sets standard of using 'use strict; use warnings;' in the test script. It is bad, that Postgres-specific perl modules do not have embedded documentation. It would be nice to see POD documentation in the TestLib.pm and PostgresNode.pm instead of just comments. It would be much easier to test writers to read documentation using perldoc utility, rather than browse through the code. I'll second Stas' suggestion about psql_ok/psql_fail functions. 1. psql_ok instead of just psql would provide visual feedback for the reader of code. One would see 'here condition is tested, here is something ended with _ok/_fail'. It would be nice that seeing say "use Test::More tests => 4" one can immediately see "Yes, there is three _ok's and one _fail in the script' 2. I have use case for psql_fail code. In my libpq failover patch there is number of cases, where it should be tested that connection is not established, But this is rather about further evolution of the tap test library, not about this set of tests. I think that this patch should be commited as soon as possible in its current form (short of already reported reject in the PostgresNode.pm init function). -- Victor Wagner-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 2:49 PM, Michael Paquierwrote: > On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas wrote: >> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier >> wrote: >>> Yes, please let's use the custom language, and let's not care of not >>> more than 1 level of nesting so as it is possible to represent >>> pg_stat_replication in a simple way for the user. >> >> "not" is used twice in this sentence in a way that renders me not able >> to be sure that I'm not understanding it not properly. > > 4 times here. Score beaten. > > Sorry. Perhaps I am tired... I was just wondering if it would be fine > to only support configurations up to one level of nested objects, like > that: > 2[node1, node2, node3] > node1, 2[node2, node3], node3 > In short, we could restrict things so as we cannot define a group of > nodes within an existing group. I see. Such a restriction doesn't seem likely to me to prevent people from doing anything actually useful. But I don't know that it buys very much either. It's often not very much simpler to handle 2 levels than n levels. However, I ain't writing the code so... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquierwrote: > On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas wrote: >> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier >> wrote: >>> Yes, please let's use the custom language, and let's not care of not >>> more than 1 level of nesting so as it is possible to represent >>> pg_stat_replication in a simple way for the user. >> >> "not" is used twice in this sentence in a way that renders me not able >> to be sure that I'm not understanding it not properly. > > 4 times here. Score beaten. > > Sorry. Perhaps I am tired... I was just wondering if it would be fine > to only support configurations up to one level of nested objects, like > that: > 2[node1, node2, node3] > node1, 2[node2, node3], node3 > In short, we could restrict things so as we cannot define a group of > nodes within an existing group. No, actually, that's stupid. Having up to two nested levels makes more sense, a quite common case for this feature being something like that: 2{node1,[node2,node3]} In short, sync confirmation is waited from node1 and (node2 or node3). Flattening groups of nodes with a new catalog will be necessary to ease the view of this data to users: - group name? - array of members with nodes/groups - group type: quorum or priority - number of items to wait for in this group -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle In Transaction Session Timeout, revived
On 2016-02-04 22:24:50 +0900, Fujii Masao wrote: > But, IIRC, one of the problems that prevent the adoption of this feature is > the addition of gettimeofday() call after every SQL command receipt. > Have you already resolved that problem? Or we don't need to care about > it because it's almost harmless? Well, it only happens when the feature is enabled, not unconditionally. So I don't think that's particularly bad, as long as the feature is not enabled by default. If we feel we need to something about the gettimeofday() call at some point, I think it'd made a lot of sense to introduce a more centralize "statement stop" time, and an extended pgstat_report_activity() that allows to specify the timestamp. Because right now we'll essentially do gettimeofday() calls successively when starting a command, starting a transaction, committing a transaction, and finishing a comment. That's pretty pointless. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 10:40 PM, Robert Haaswrote: > On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier > wrote: >> Yes, please let's use the custom language, and let's not care of not >> more than 1 level of nesting so as it is possible to represent >> pg_stat_replication in a simple way for the user. > > "not" is used twice in this sentence in a way that renders me not able > to be sure that I'm not understanding it not properly. 4 times here. Score beaten. Sorry. Perhaps I am tired... I was just wondering if it would be fine to only support configurations up to one level of nested objects, like that: 2[node1, node2, node3] node1, 2[node2, node3], node3 In short, we could restrict things so as we cannot define a group of nodes within an existing group. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/01/29 17:52, Ashutosh Bapat wrote: On Fri, Jan 29, 2016 at 2:05 PM, Etsuro Fujita> wrote: On 2016/01/29 1:26, Ashutosh Bapat wrote: Here is the summary of changes from the last set of patches 2. Included fix for EvalPlanQual in postgres_fdw - an alternate local path is obtained from the list of paths linked to the joinrel. Since this is done before adding the ForeignPath, we should be a local path available for given join. I looked at that code in the patch (ie, postgresRecheckForeignScan and the helper function that creates a local join path for a given foreign join path.), briefly. Maybe I'm missing something, but I think that is basically the same as the fix I proposed for addressing this issue, posted before [1], right? Yes, although I have added functions to copy the paths, not consider pathkeys and change the relevant members of the paths. Sorry if I have missed giving you due credits. If so, my concern is, the helper function probably wouldn't extend to the parameterized-foreign-join-path cases, though that would work well for the unparameterized-foreign-join-path cases. We don't support parameterized-foreign-join paths for 9.6? If we do not find a local path with given parameterization, it means there are other local parameterized paths which are superior to it. This possibly indicates that there will be foreign join parameterised paths which are superior to this parameterized path, so we basically do not create foreign join path with that parameterization. The latest version of the postgres_fdw join pushdown patch will support only the unparameterized-path case, so we don't have to consider this, but why do you think the superiority of parameterizations is preserved between remote joining and local joining? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcommitfest reply having corrupted subject line
On Thu, Feb 4, 2016 at 7:26 AM, Noah Mischwrote: > The following message, which bears "User-Agent: pgcommitfest", > > > http://www.postgresql.org/message-id/flat/20160202164101.1291.30526.p...@coridan.postgresql.org > > added spaces after commas in its subject line, compared to the subject > line of > its In-Reply-To message. > > new subject line: Re: Add generate_series(date, date) and > generate_series(date, date, integer) > previous subject line: Re: Add generate_series(date,date) and > generate_series(date,date,integer) > > I see no way to manually alter the subject line from the > commitfest.postgresql.org review UI. Is commitfest.postgresql.org > initiating > this change internally? > That is weird. The CF app certainly doesn't do that intentionally - it copies it from the archives. And if I look in the db on the cf app, it has the subject without the space as the field that it copied :O The code is line 355-358 at: http://git.postgresql.org/gitweb/?p=pgcommitfest2.git;a=blob;f=pgcommitfest/commitfest/views.py;h=b4f19b2db470e5269943418b2402ca9ddfff0dff;hb=fec3b2431730c131a206a170a99a7610cdbacc6b#l355 the "subject" field in the db that we copy does not have the spaces... I honestly have no idea where they are coming from :O I'm guessing it must be something internally in the python libraries that create the MIME. Have you seen this with any other messages, that you can recall, or just this one? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
Hi. I’ve looked over proposed patch and migrated my shell tests scripts that i’ve used for testing twophase commits on master/slave to this test framework. Everything looks mature, and I didn’t encountered any problems with writing new tests using this infrastructure. >From my point of view I don’t see any problems to commit this patches in their >current state. Also some things that came into mind about test suite: 0) There are several routines that does actual checking, like is/command_ok/command_fails. I think it will be very handy to have wrappers psql_ok/psql_fails that calls psql through the command_ok/fails. 1) Better to raise more meaningful error when IPC::Run is absend. 2) --enable-tap-tests deserves mention in test/recovery/README and more obvious error message when one trying to run make check in test/recovery without --enable-tap-tests. 3) Is it hard to give ability to run TAP tests in extensions? 4) It will be handy if make check will write path to log files in case of failed test. 5) psql() accepts database name as a first argument, but everywhere in tests it is ‘postgres’. Isn’t it simpler to store dbname in connstr, and have separate function to change database? 6) Clean logs on prove restart? Clean up tmp installations? 7) Make check sets PGPORT PG_REGRESS for prove. Is it necessary? > On 22 Jan 2016, at 09:17, Michael Paquierwrote: > > On Mon, Dec 21, 2015 at 4:45 PM, Michael Paquier > wrote: >> As this thread is stalling a bit, please find attached a series of >> patch gathering all the pending issues for this thread: >> - 0001, fix config_default.pl for MSVC builds to take into account TAP tests >> - 0002, append a node name in get_new_node (per Noah's request) >> - 0003, the actual recovery test suite >> Hopefully this facilitates future reviews. > > Patch 2 has been pushed as c8642d9 (thanks Alvaro). The remaining two > patches still apply and pass cleanly. > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/04 8:00, Robert Haas wrote: On Wed, Feb 3, 2016 at 5:56 PM, Robert Haaswrote: On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat wrote: PFA patches with naming conventions similar to previous ones. pg_fdw_core_v7.patch: core changes pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown pg_join_pd_v7.patch: combined patch for ease of testing. Thank you for working on this, Ashutosh and Robert! I've not look at the patches closely yet, but ISTM the patches would be really in good shape! Hmm, I think that GetPathForEPQRecheck() is a pretty terrible name. How about GetExistingJoinPath()? +1 Oops. Hit Send too soon. Also, how about writing if (path->param_info != NULL) continue; instead of burying the core of the function in another level of indentation? I think you should skip paths that aren't parallel_safe, too, and the documentation should be clear that this will find an unparameterized, parallel-safe joinpath if one exists. + ForeignPath *foreign_path; + foreign_path = (ForeignPath *)joinpath->outerjoinpath; Maybe insert a blank line between here, and in the other, similar case. * Is it safe to replace outerjoinpath with its fdw_outerpath the following way? I think that if the join relation represented by outerjoinpath has local conditions that can't be executed remotely, we have to keep outerjoinpath in the path tree; we will otherwise fail to execute the local conditions. No? + /* +* If either inner or outer path is a ForeignPath corresponding to +* a pushed down join, replace it with the fdw_outerpath, so that we +* maintain path for EPQ checks built entirely of local join +* strategies. +*/ + if (IsA(joinpath->outerjoinpath, ForeignPath)) + { + ForeignPath *foreign_path; + foreign_path = (ForeignPath *)joinpath->outerjoinpath; + if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL) + joinpath->outerjoinpath = foreign_path->fdw_outerpath; + } * IIUC, that function will be used by custom joins, so I think it would be better to put that function somewhere in the /optimizer directory (pathnode.c?). Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] silent data loss with ext4 / all current versions
On Tue, Feb 2, 2016 at 9:59 AM, Andres Freundwrote: > On 2016-02-02 09:56:40 +0900, Michael Paquier wrote: >> And there is no actual risk of data loss > > Huh? More precise: what I mean here is that should an OS crash or a power failure happen, we would fall back to recovery at next restart, so we would not actually *lose* data. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] silent data loss with ext4 / all current versions
On Tue, Feb 2, 2016 at 4:20 PM, Michael Paquier wrote: > Not wrong, and this leads to the following: > void rename_safe(const char *old, const char *new, bool isdir, int elevel); > Controlling elevel is necessary per the multiple code paths that would > use it. Some use ERROR, most of them FATAL, and a bit of WARNING. Does > that look fine? After really coding it, I finished with the following thing: +int +rename_safe(const char *old, const char *new) There is no need to extend that for directories, well we could of course but all the renames happen on files so I see no need to make that more complicated. More refactoring of the other rename() calls could be done as well by extending rename_safe() with a flag to fsync the data within a critical section, particularly for the replication slot code. I have let that out to not complicate more the patch. -- Michael diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index f6da673..11287aa 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -418,9 +418,10 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, TLHistoryFilePath(path, newTLI); /* - * Prefer link() to rename() here just to be really sure that we don't - * overwrite an existing file. However, there shouldn't be one, so - * rename() is an acceptable substitute except for the truly paranoid. + * Prefer link() to rename_safe() here just to be really sure that we + * don't overwrite an existing file. However, there shouldn't be one, + * so rename_safe() is an acceptable substitute except for the truly + * paranoid. */ #if HAVE_WORKING_LINK if (link(tmppath, path) < 0) @@ -430,7 +431,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, tmppath, path))); unlink(tmppath); #else - if (rename(tmppath, path) < 0) + if (rename_safe(tmppath, path) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not rename file \"%s\" to \"%s\": %m", @@ -508,9 +509,10 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size) TLHistoryFilePath(path, tli); /* - * Prefer link() to rename() here just to be really sure that we don't - * overwrite an existing logfile. However, there shouldn't be one, so - * rename() is an acceptable substitute except for the truly paranoid. + * Prefer link() to rename_safe() here just to be really sure that we + * don't overwrite an existing logfile. However, there shouldn't be + * one, so rename_safe() is an acceptable substitute except for the + * truly paranoid. */ #if HAVE_WORKING_LINK if (link(tmppath, path) < 0) @@ -520,7 +522,7 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size) tmppath, path))); unlink(tmppath); #else - if (rename(tmppath, path) < 0) + if (rename_safe(tmppath, path) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not rename file \"%s\" to \"%s\": %m", diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a2846c4..b61fcc7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3251,9 +3251,10 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, } /* - * Prefer link() to rename() here just to be really sure that we don't - * overwrite an existing logfile. However, there shouldn't be one, so - * rename() is an acceptable substitute except for the truly paranoid. + * Prefer link() to rename_safe() here just to be really sure that we + * don't overwrite an existing logfile. However, there shouldn't be + * one, so rename_safe() is an acceptable substitute except for the + * truly paranoid. */ #if HAVE_WORKING_LINK if (link(tmppath, path) < 0) @@ -3268,7 +3269,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, } unlink(tmppath); #else - if (rename(tmppath, path) < 0) + if (rename_safe(tmppath, path) < 0) { if (use_lock) LWLockRelease(ControlFileLock); @@ -3792,7 +3793,7 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) * flag, rename will fail. We'll try again at the next checkpoint. */ snprintf(newpath, MAXPGPATH, "%s.deleted", path); - if (rename(path, newpath) != 0) + if (rename_safe(path, newpath) != 0) { ereport(LOG, (errcode_for_file_access(), @@ -3800,10 +3801,12 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) path))); return; } + rc = unlink(newpath); #else rc = unlink(path); #endif + if (rc != 0) { ereport(LOG, @@ -5291,7 +5294,7 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) * re-enter archive recovery mode in a subsequent crash. */ unlink(RECOVERY_COMMAND_DONE); - if (rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0) + if (rename_safe(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE) != 0) ereport(FATAL, (errcode_for_file_access(),