Re: [HACKERS] parallel query vs extensions
On 15 April 2016 at 12:45, Jeff Janeswrote: > I think there are a lot of extensions which create functions which > could benefit from being declared parallel safe. But how does one go > about doing that? > > create extension xml2; > select xml_valid(filler),count(*) from pgbench_accounts group by 1; > Time: 3205.830 ms > > alter function xml_valid (text) parallel safe; > > select xml_valid(filler),count(*) from pgbench_accounts group by 1; > Time: 1803.936 ms > > (Note that I have no particular interest in the xml2 extension, it > just provides a convenient demonstration of the general principle) > > Should every relevant contrib extension get a version bump with a > transition file which is nothing but a list of "alter function blah > blah blah parallel safe" ? > > And what of non-contrib extensions? Is there some clever alternative > to having a bunch of pseudo versions, like "1.0", "1.0_96", "1.1", > "1.1_9.6", "1.2", "1.2_96", etc.? > What I've done in the past for similar problems is preprocess the extension--x.y.sql files in the Makefile to conditionally remove unsupported syntax, functions, etc. It's rather less than perfect because if the user pg_upgrades they won't get the now-supported options. They'll have the old-version extension on the new version and would have to drop & re-create to get the new version contents. You could create variant pseudo-extensions to make this clearer - myext95--1.0.sql, myext96--1.0.sql, etc - but there's still no way to ALTER EXTENSION to upgrade. pseudo-versions like you suggest are probably going to work, but the extension machinery doesn't understand them and you can only specify one of them as the default in the control file. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[HACKERS] Pgbench with -f and -S
Hi, Pgbench allows -f and -S combinations together where the doc says that -S effectively uses the internal select-only script. Is it okay to assume that -f is disregarded here? Or are they run in round-robin fashion (although then, how does it know which read-only part of my script to run?) or something else like that? Effectively, I think it should raise NOTICE that -f is useless here. - robins -- - robins
Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3
On April 16, 2016 6:02:39 PM PDT, Tom Lanewrote: >I wrote: >> So at this point I'm not sure what to do. I could back out the >back-patch >> of 44cd47c1d49655c5, which would mean accepting that 9.2/9.3 are >broken >> and will never be fixed for HPPA, as well as any other architectures >that >> use the same fallback memory barrier implementation. The lack of >> complaints from the field suggests that nobody would care. Or I >could >> push forward by back-patching daa7527afc227443 (and a couple of minor >> follow-on cleanups). That doesn't seem particularly risky, now that >> 9.4's been out for awhile, but it's kind of a large back-patch to >benefit >> architectures that apparently no actual users care about. > >I went ahead and prepared and tested such a patch; the version for 9.3 >is attached. (9.2 is identical modulo some pgindent-induced whitespace >difference.) This doesn't look too hazardous to me, so I'm thinking >we should apply it. I can't look at the patch just now, but the plan sounds good. Of you rather have somebody look art the patch before, I can do tomorrow morning. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3
I wrote: > So at this point I'm not sure what to do. I could back out the back-patch > of 44cd47c1d49655c5, which would mean accepting that 9.2/9.3 are broken > and will never be fixed for HPPA, as well as any other architectures that > use the same fallback memory barrier implementation. The lack of > complaints from the field suggests that nobody would care. Or I could > push forward by back-patching daa7527afc227443 (and a couple of minor > follow-on cleanups). That doesn't seem particularly risky, now that > 9.4's been out for awhile, but it's kind of a large back-patch to benefit > architectures that apparently no actual users care about. I went ahead and prepared and tested such a patch; the version for 9.3 is attached. (9.2 is identical modulo some pgindent-induced whitespace difference.) This doesn't look too hazardous to me, so I'm thinking we should apply it. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 3808836..fc4abbc 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** typedef struct *** 500,505 --- 500,508 slock_t*ShmemLock; VariableCache ShmemVariableCache; Backend*ShmemBackendArray; + #ifndef HAVE_SPINLOCKS + PGSemaphore SpinlockSemaArray; + #endif LWLock *LWLockArray; slock_t*ProcStructLock; PROC_HDR *ProcGlobal; *** save_backend_variables(BackendParameters *** 6050,6055 --- 6053,6061 param->ShmemVariableCache = ShmemVariableCache; param->ShmemBackendArray = ShmemBackendArray; + #ifndef HAVE_SPINLOCKS + param->SpinlockSemaArray = SpinlockSemaArray; + #endif param->LWLockArray = LWLockArray; param->ProcStructLock = ProcStructLock; param->ProcGlobal = ProcGlobal; *** restore_backend_variables(BackendParamet *** 6278,6283 --- 6284,6292 ShmemVariableCache = param->ShmemVariableCache; ShmemBackendArray = param->ShmemBackendArray; + #ifndef HAVE_SPINLOCKS + SpinlockSemaArray = param->SpinlockSemaArray; + #endif LWLockArray = param->LWLockArray; ProcStructLock = param->ProcStructLock; ProcGlobal = param->ProcGlobal; diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 918ac51..ec8e4f6 100644 *** a/src/backend/storage/ipc/ipci.c --- b/src/backend/storage/ipc/ipci.c *** CreateSharedMemoryAndSemaphores(bool mak *** 103,108 --- 103,109 * need to be so careful during the actual allocation phase. */ size = 10; + size = add_size(size, SpinlockSemaSize()); size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE, sizeof(ShmemIndexEnt))); size = add_size(size, BufferShmemSize()); diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 129d9f8..4194db6 100644 *** a/src/backend/storage/ipc/shmem.c --- b/src/backend/storage/ipc/shmem.c *** InitShmemAllocation(void) *** 116,124 Assert(shmhdr != NULL); /* ! * Initialize the spinlock used by ShmemAlloc. We have to do the space ! * allocation the hard way, since obviously ShmemAlloc can't be called ! * yet. */ ShmemLock = (slock_t *) (((char *) shmhdr) + shmhdr->freeoffset); shmhdr->freeoffset += MAXALIGN(sizeof(slock_t)); --- 116,139 Assert(shmhdr != NULL); /* ! * If spinlocks are disabled, initialize emulation layer. We have to do ! * the space allocation the hard way, since obviously ShmemAlloc can't be ! * called yet. ! */ ! #ifndef HAVE_SPINLOCKS ! { ! PGSemaphore spinsemas; ! ! spinsemas = (PGSemaphore) (((char *) shmhdr) + shmhdr->freeoffset); ! shmhdr->freeoffset += MAXALIGN(SpinlockSemaSize()); ! SpinlockSemaInit(spinsemas); ! Assert(shmhdr->freeoffset <= shmhdr->totalsize); ! } ! #endif ! ! /* ! * Initialize the spinlock used by ShmemAlloc; we have to do this the hard ! * way, too, for the same reasons as above. */ ShmemLock = (slock_t *) (((char *) shmhdr) + shmhdr->freeoffset); shmhdr->freeoffset += MAXALIGN(sizeof(slock_t)); diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c index 2864790..874e313 100644 *** a/src/backend/storage/lmgr/spin.c --- b/src/backend/storage/lmgr/spin.c *** *** 25,33 --- 25,48 #include "miscadmin.h" #include "replication/walsender.h" #include "storage/lwlock.h" + #include "storage/pg_sema.h" #include "storage/spin.h" + #ifndef HAVE_SPINLOCKS + PGSemaphore SpinlockSemaArray; + #endif + + /* + * Report the amount of shared memory needed to store semaphores for spinlock + * support. + */ + Size + SpinlockSemaSize(void) + { + return SpinlockSemas() * sizeof(PGSemaphoreData); + } + #ifdef HAVE_SPINLOCKS /* *** SpinlockSemas(void) *** 51,71 int SpinlockSemas(void) { ! int nsemas; ! /* ! * It would be cleaner to distribute this
Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions
On Sat, Apr 16, 2016 at 03:28:23PM -0700, Andres Freund wrote: > On 2016-04-16 18:23:01 -0400, Noah Misch wrote: > > On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freundwrote: > > > > I can think of a number of relatively easy ways to address this: > > > > 1) Just zap (or issue?) all pending flush requests when getting an > > > >smgrinval/smgrclosenode > > > > 2) Do 1), but filter for the closed relnode > > > > 3) Actually handle the case of the last open segment not being > > > >RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > > > > > I'm kind of inclined to do both 3) and 1). > > > > Andres, you own the underlying open item, right? > > Right. > > > Do you plan to implement this fix you have outlined? If so, when? > > Yes, I plan to next week. Completion by the end of next week will work. Thanks. At that time (2016-04-24T00:00 UTC), this would otherwise have spent twenty-three days as an open item. Would you inform this thread if you discover reasons to think your plan will no longer succeed? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
Hi, On 2016-04-16 18:27:06 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2016-04-16 17:52:44 -0400, Tom Lane wrote: > >> That's more than a 5X penalty, which seems like it would make the > >> feature unusable; unless there is an argument that that's an extreme > >> case that wouldn't be representative of most real-world usage. > >> Which there may well be; I've not been following this thread carefully. > > > The 4 % was with the feature disabled (in comparison to before it's > > introduction), we're not sure where that's coming from. But the 5x - and > > that was just on a mid-sized box - is with the feature enabled. > > 128 processors is a mid-sized box? It has fewer. > Or if you didn't have 128 processors, why are you testing "-c 128 -j > 128" cases? I tried 128, because it's a random number I picket out of my hat. I've posted various different client numbers elsewhere in the thread. The machine (a VM, this isn't the best test!), has 20 cores / 40 hw threads afaik. But 128 connections on 40 "cpus" isn't unrealistic. Many workloads have a lot more connections and concurrent queries than cores - besides often being operationally easier, it's also sensible from a hardware utilization perspective. Due to latency effects individual connections frequently are idle; even if the client were issuing queries as fast as possible. > More seriously, the complaints here seem to center on performance in a > read-only workload; but I don't actually see why you'd want to turn on > this feature in a read-only, or even read-mostly, workload. In a purely read-only workload it's surely pointless. But I don't see why the results would be any benefit in a 75 read/25 write mix; which is probably already more write heavy than a lot of the actual workloads out there. > It exists for > the benefit of people who are trying to keep their pg_xlog/ directories > reasonably sized, no? That doesn't sound very read-only-ish to me. pg_xlog size? By my understanding it's there to cope with the bloat introduced by longrunning readonly transactions? Isn't the idea that "old snapshots" basically don't enforce their xmin anymore, allowing vacuum/hot pruning? Such old snapshots continue to work as long as they're not used to make visibility decisions about pages which have been modified "recently". The whole feature only is interesting if such old snapshots are likely to only access data that's not frequently modified. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions
On 2016-04-16 18:23:01 -0400, Noah Misch wrote: > On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freundwrote: > > > I can think of a number of relatively easy ways to address this: > > > 1) Just zap (or issue?) all pending flush requests when getting an > > >smgrinval/smgrclosenode > > > 2) Do 1), but filter for the closed relnode > > > 3) Actually handle the case of the last open segment not being > > >RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > > > I'm kind of inclined to do both 3) and 1). > > Andres, you own the underlying open item, right? Right. > Do you plan to implement this fix you have outlined? If so, when? Yes, I plan to next week. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
Andres Freundwrites: > On 2016-04-16 17:52:44 -0400, Tom Lane wrote: >> That's more than a 5X penalty, which seems like it would make the >> feature unusable; unless there is an argument that that's an extreme >> case that wouldn't be representative of most real-world usage. >> Which there may well be; I've not been following this thread carefully. > The 4 % was with the feature disabled (in comparison to before it's > introduction), we're not sure where that's coming from. But the 5x - and > that was just on a mid-sized box - is with the feature enabled. 128 processors is a mid-sized box? Or if you didn't have 128 processors, why are you testing "-c 128 -j 128" cases? More seriously, the complaints here seem to center on performance in a read-only workload; but I don't actually see why you'd want to turn on this feature in a read-only, or even read-mostly, workload. It exists for the benefit of people who are trying to keep their pg_xlog/ directories reasonably sized, no? That doesn't sound very read-only-ish to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions
On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freundwrote: > > I can think of a number of relatively easy ways to address this: > > 1) Just zap (or issue?) all pending flush requests when getting an > >smgrinval/smgrclosenode > > 2) Do 1), but filter for the closed relnode > > 3) Actually handle the case of the last open segment not being > >RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > I'm kind of inclined to do both 3) and 1). Andres, you own the underlying open item, right? Do you plan to implement this fix you have outlined? If so, when? > #3 seems like it's probably about 15 years overdue, so let's do that anyway. > > I don't quite understand why #1 fixes the problem - not because I > doubt you, but just because I haven't studied it much. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On 2016-04-16 17:52:44 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote: > >> That is more controversial than the potential ~2% regression for > >> old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing > >> that way, and Andres[4] is not. > > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be > > a clear proposal on the table how to solve the scalability issue around > > MaintainOldSnapshotTimeMapping(). Postponing the optimization around > > something as trivial as a spinlock around reading an LSN is one thing, > > postponing something we don't know the solution to is anohter. > > The message Noah cited mentions only a 4% regression, but this one > seems far worse: > > http://www.postgresql.org/message-id/20160413200148.bawmwjdmggbll...@alap3.anarazel.de > > That's more than a 5X penalty, which seems like it would make the > feature unusable; unless there is an argument that that's an extreme > case that wouldn't be representative of most real-world usage. > Which there may well be; I've not been following this thread carefully. The 4 % was with the feature disabled (in comparison to before it's introduction), we're not sure where that's coming from. But the 5x - and that was just on a mid-sized box - is with the feature enabled. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in GIN index build
On 16/04/2016 20:45, Tom Lane wrote: > Julien Rouhaudwrites: > >> Also, in dataPlaceToPageLeaf() and ginVacuumPostingTreeLeaf(), shouldn't >> the START_CRIT_SECTION() calls be placed before the xlog code? > > Yeah, they should. Evidently somebody kluged it to avoid doing a palloc > inside a critical section, while ignoring every other rule about where to > place critical sections. (Maybe we should put an assert about being > within a critical section into XLogBeginInsert.) > After a quick test, it appears that at least log_smgrcreate() calls XLogBeginInsert() without being in a critical section, from the various DDL commands. > This code is pretty fundamentally broken, isn't it. elog() calls > inside a critical section? Really? Even worse, a MemoryContextDelete, > which hardly seems like something that should be assumed error-proof. > > Not to mention the unbelievable ugliness of a function that sometimes > returns with an open critical section (and WAL insertion started) and > sometimes doesn't. > > It kinda looks like this was hacked up without bothering to keep the > comment block in ginPlaceToPage in sync with reality, either. > > I think this needs to be redesigned so that the critical section and WAL > insertion calls all happen within a single straight-line piece of code. > > We could try making that place be ginPlaceToPage(). So far as > registerLeafRecompressWALData is concerned, that does not look that hard; > it could palloc and fill the WAL-data buffer the same as it's doing now, > then pass it back up to be registered (and eventually pfreed) in > ginPlaceToPage. However, that would also mean postponing the call of > dataPlaceToPageLeafRecompress(), which seems much messier. I assume > the data it's working from is mostly in the tmpCtx that > dataPlaceToPageLeaf wants to free before returning. Maybe we could > move creation/destruction of the tmpCtx out to ginPlaceToPage, as well? > > The other line of thought is to move the WAL work that ginPlaceToPage > does down into the subroutines. That would probably result in some > duplication of code, but it might end up being a cleaner solution. > > Anyway, I think memory leakage is just the start of what's broken here, > and fixing it won't be a very small patch. > I'm not really confident, but I'll give a try. Probably with your second solution which looks easier. Any pointer is welcome, unless someone more familiar with this code wants to take it. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
Andres Freundwrites: > On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >> That is more controversial than the potential ~2% regression for >> old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >> that way, and Andres[4] is not. > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be > a clear proposal on the table how to solve the scalability issue around > MaintainOldSnapshotTimeMapping(). Postponing the optimization around > something as trivial as a spinlock around reading an LSN is one thing, > postponing something we don't know the solution to is anohter. The message Noah cited mentions only a 4% regression, but this one seems far worse: http://www.postgresql.org/message-id/20160413200148.bawmwjdmggbll...@alap3.anarazel.de That's more than a 5X penalty, which seems like it would make the feature unusable; unless there is an argument that that's an extreme case that wouldn't be representative of most real-world usage. Which there may well be; I've not been following this thread carefully. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On 2016-04-16 16:44:52 -0400, Noah Misch wrote: > That is more controversial than the potential ~2% regression for > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing > that way, and Andres[4] is not. FWIW, I could be kinda convinced that it's temporarily ok, if there'd be a clear proposal on the table how to solve the scalability issue around MaintainOldSnapshotTimeMapping(). Postponing the optimization around something as trivial as a spinlock around reading an LSN is one thing, postponing something we don't know the solution to is anohter. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On Wed, Apr 13, 2016 at 03:21:31PM -0500, Kevin Grittner wrote: > If 2201d801 was not included in your -1 tests, have you identified > where the 2% extra run time is going on -1 versus reverted? Since > several other threads lately have reported bigger variation than > that based on random memory alignment issues, can we confirm that > this is a real difference in what is at master's HEAD? If anyone wishes to confirm that, I recommend this method: http://www.postgresql.org/message-id/87vbitb2zp@news-spur.riddles.org.uk PostgreSQL has not required that from contributors, though. For putative regressions this small, we've either analyzed them theoretically or just dismissed them. The key judgment to finalize here is whether it's okay to release this feature given its current effect[1], when enabled, on performance. That is more controversial than the potential ~2% regression for old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing that way, and Andres[4] is not. If anyone else wants to weigh in, now is the time. [1] http://www.postgresql.org/message-id/20160413192110.fogwesjti3kxy...@alap3.anarazel.de [2] http://www.postgresql.org/message-id/20160413140821.GA6568@alvherre.pgsql [3] http://www.postgresql.org/message-id/CA+TgmoZqN0xevR+1pZ6j-99-ZCBoOphr-23tiREb+QW1Eu=k...@mail.gmail.com [4] http://www.postgresql.org/message-id/20160413212356.uv4velailmivn...@alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in GIN index build
Julien Rouhaudwrites: > After some digging, the leak comes from walbufbegin palloc in > registerLeafRecompressWALData(). > IIUC, walbufbegin isn't pfree-d and can't be before XLogInsert() is > called, which happens in ginPlaceToPage(). Hmm. > I don't see a simple way to fix that. My first idea would be to change > GinBtreeData's placeToPage (and all other needed) functions signature to > pass a pointer to pfree in ginPlaceToPage() if needed, but it doesn't > really seem appealing. In any case, I'd be happy to work on a patch if > needed. > Also, in dataPlaceToPageLeaf() and ginVacuumPostingTreeLeaf(), shouldn't > the START_CRIT_SECTION() calls be placed before the xlog code? Yeah, they should. Evidently somebody kluged it to avoid doing a palloc inside a critical section, while ignoring every other rule about where to place critical sections. (Maybe we should put an assert about being within a critical section into XLogBeginInsert.) This code is pretty fundamentally broken, isn't it. elog() calls inside a critical section? Really? Even worse, a MemoryContextDelete, which hardly seems like something that should be assumed error-proof. Not to mention the unbelievable ugliness of a function that sometimes returns with an open critical section (and WAL insertion started) and sometimes doesn't. It kinda looks like this was hacked up without bothering to keep the comment block in ginPlaceToPage in sync with reality, either. I think this needs to be redesigned so that the critical section and WAL insertion calls all happen within a single straight-line piece of code. We could try making that place be ginPlaceToPage(). So far as registerLeafRecompressWALData is concerned, that does not look that hard; it could palloc and fill the WAL-data buffer the same as it's doing now, then pass it back up to be registered (and eventually pfreed) in ginPlaceToPage. However, that would also mean postponing the call of dataPlaceToPageLeafRecompress(), which seems much messier. I assume the data it's working from is mostly in the tmpCtx that dataPlaceToPageLeaf wants to free before returning. Maybe we could move creation/destruction of the tmpCtx out to ginPlaceToPage, as well? The other line of thought is to move the WAL work that ginPlaceToPage does down into the subroutines. That would probably result in some duplication of code, but it might end up being a cleaner solution. Anyway, I think memory leakage is just the start of what's broken here, and fixing it won't be a very small patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory leak in GIN index build
Hello, Another colleague provided a report of memory leak, during a GIN index build. Test case to reproduce the attached (need to create a gin index on the val column after loading). Sorry, it generates a 24GB table, and memory start leaking with a 1GB maintenance_work_mem after reaching 8 or 9 times the m_w_m setting, leading to ~ 9GB used in Gin build temporary context. After some digging, the leak comes from walbufbegin palloc in registerLeafRecompressWALData(). IIUC, walbufbegin isn't pfree-d and can't be before XLogInsert() is called, which happens in ginPlaceToPage(). I don't see a simple way to fix that. My first idea would be to change GinBtreeData's placeToPage (and all other needed) functions signature to pass a pointer to pfree in ginPlaceToPage() if needed, but it doesn't really seem appealing. In any case, I'd be happy to work on a patch if needed. Also, in dataPlaceToPageLeaf() and ginVacuumPostingTreeLeaf(), shouldn't the START_CRIT_SECTION() calls be placed before the xlog code? Regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org generate_data.pl Description: Perl program -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Spinlocks and semaphores in 9.2 and 9.3
This rabbit hole keeps getting deeper and deeper :-( I realized a couple days ago that it had been close to three years since I last tried building the further-back branches on my ancient HPPA box. Not so surprisingly, things were broken: commits 37de8de9e33606a0 et al had introduced use of memory barriers into 9.2 and 9.3, which up to that moment had had none, so we'd never bothered to make barrier.h actually work in those branches. I back-patched some fixes in barrier.h, which got things to compile, and then commit 44cd47c1d49655c5, which got things to actually work ... on that box anyway. I now see from the buildfarm (and can confirm locally) that 44cd47c1d49655c5 breaks builds with --disable-spinlock, because it introduces an initialization of a spinlock before the underlying PGSemaphore infrastructure can possibly get initialized. In 9.4 and up, this works because of daa7527afc227443, which decoupled spinlocks from semaphores enough that you can do s_init_lock_sema() long before you can actually use the spinlock. Come to think of it, that commit also means that you can get away with using a spinlock you've never done initialization on at all, which is not so good. So at this point I'm not sure what to do. I could back out the back-patch of 44cd47c1d49655c5, which would mean accepting that 9.2/9.3 are broken and will never be fixed for HPPA, as well as any other architectures that use the same fallback memory barrier implementation. The lack of complaints from the field suggests that nobody would care. Or I could push forward by back-patching daa7527afc227443 (and a couple of minor follow-on cleanups). That doesn't seem particularly risky, now that 9.4's been out for awhile, but it's kind of a large back-patch to benefit architectures that apparently no actual users care about. Thoughts? Independently of that, I think we should fix the --disable-spinlock support so that a spinlock containing zero is rejected as not properly initialized. A large part of the mess here comes from the fact that HPPA is our only architecture in which zero is not the correct initialization state for a spinlock. I'd like to have some more portable method of catching failure to initialize a spinlock. I only propose doing this part in HEAD, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated backup APIs for non-exclusive backups
On Wed, Apr 13, 2016 at 4:07 AM, Noah Mischwrote: > On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote: > > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch wrote: > > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote: > > > > Well, if we *don't* do the rewrite before we release it, then we > have to > > > > instead put information about the new version of the functions into > the > > > old > > > > structure I think. > > > > > > > > So I think it's an open issue. > > > > > > Works for me... > > > > > > [This is a generic notification.] > > > > > > The above-described topic is currently a PostgreSQL 9.6 open item. > Magnus, > > > since you committed the patch believed to have created it, you own this > > > open > > > item. If that responsibility lies elsewhere, please let us know whose > > > responsibility it is to fix this. Since new open items may be > discovered > > > at > > > any time and I want to plan to have them all fixed well in advance of > the > > > ship > > > date, I will appreciate your efforts toward speedy resolution. Please > > > present, within 72 hours, a plan to fix the defect within seven days of > > > this > > > message. Thanks. > > > > > > > I won't have time to do the bigger rewrite/reordeirng by then, but I can > > certainly commit to having the smaller updates done to cover the new > > functionality in less than a week. If nothing else, that'll be something > > for me to do on the flight over to pgconf.us. > > Thanks for that plan; it sounds good. > Here's a suggested patch. There is some duplication between the non-exclusive and exclusive backup sections, but I wanted to make sure that each set of instructions can just be followed top-to-bottom. I've also removed some tips that aren't really necessary as part of the step-by-step instructions in order to keep things from exploding in size. Finally, I've changed references to "backup dump" to just be "backup", because it's confusing to call them something with dumps in when it's not pg_dump. Enough that I got partially confused myself while editing... Comments? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *** *** 818,823 test ! -f /mnt/server/archivedir/000100A90065 cp pg_xlog/ --- 818,838 simple. It is very important that these steps are executed in sequence, and that the success of a step is verified before proceeding to the next step. + + + Low level base backups can be made in a non-exclusive or an exclusive + way. The non-exclusive method is recommended and the exclusive one will + at some point be deprecated and removed. + + + Making a non-exclusive low level backup + + A non-exclusive low level backup is one that allows other + concurrent backups to be running (both those started using + the same backup API and those started using + . + + *** *** 826,857 test ! -f /mnt/server/archivedir/000100A90065 cp pg_xlog/ ! Connect to the database as a user with rights to run pg_start_backup ! (superuser, or a user who has been granted EXECUTE on the function) ! and issue the command: SELECT pg_start_backup('label'); where label is any string you want to use to uniquely ! identify this backup operation. (One good practice is to use the ! full path where you intend to put the backup dump file.) pg_start_backup creates a backup label file, called backup_label, in the cluster directory with ! information about your backup, including the start time and label ! string. The function also creates a tablespace map file, called tablespace_map, in the cluster directory with ! information about tablespace symbolic links in pg_tblspc/ ! if one or more such link is present. Both files are critical to the integrity of the backup, should you need to restore from it. - It does not matter which database within the cluster you connect to to - issue this command. You can ignore the result returned by the function; - but if it reports an error, deal with that before proceeding. - - - By default, pg_start_backup can take a long time to finish. This is because it performs a checkpoint, and the I/O required for the checkpoint will be spread out over a significant --- 841,970 ! Connect to the server (it does not matter which database) as a user with ! rights to run pg_start_backup (superuser, or a user who has been granted ! EXECUTE on the function) and issue the command: ! ! SELECT pg_start_backup('label', false, false); ! ! where label is any string you want to use to uniquely ! identify this
Re: [HACKERS] Disallow unique index on system columns
David Rowleywrites: > On 15 April 2016 at 13:43, David Rowley wrote: >> The attached patch just disallows any index containing a system >> column, apart from OID. > Seems I only did half the job as I forgot to think to check for system > columns that are hidden away inside expressions or predicates. > The attached fixes that. Pushed. I moved the check into DefineIndex, as that's where user-facing complaints about indexes generally ought to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] OS scheduler bugs affecting high-concurrency contention
There is a paper that any one interested in performance at high concurrency, especially in Linux, should read[1]. While doing other work, a group of researchers saw behavior that they suspected was due to scheduler bugs in Linux. There were no tools that made proving that practical, so they developed such a tool set and used it to find four bugs in the Linux kernel which were introduced in these releases, have not yet been fixed, and have this following maximum impact when running NAS benchmarks, based on running with and without the researchers' fixes for the bugs: 2.6.32: 22% 2,6.38: 13x 3.9: 27x 3.19: 138x That's right -- one of these OS scheduler bugs in production versions of Linux can make one of NASA's benchmarks run for 138 times as long as it does without the bug. I don't feel that I can interpret the results of any high-concurrency benchmarks in a meaningful way without knowing which of these bugs were present in the OS used for the benchmark. Just as an example, it is helpful to know that the benchmarks Andres presented were run on 3.16, so it would have three of these OS bugs affecting results, but not the most severe one. I encourage you to read the paper an draw your own conclusions. Anyway, please don't confuse this thread with the one on the "snapshot too old" patch -- I am still working on that and will post results there when they are ready. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova. The Linux Scheduler: a Decade of Wasted Cores. In Proceedings of the 11th European Conference on Computer Systems, EuroSys’16. April, 2016, London, UK. http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf [2] NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor pg_dump as a library?
On 04/14/2016 07:28 PM, David Steele wrote: As far as I know pg_dump share locks everything before it starts so there shouldn't be issues with concurrent DDL. Try creating a new inherited table with FKs, etc. during a pg_dump and you'll see lots of fun lock waits. I am pretty sure that it does not lock functions, types or casts. So if you rename a function during pg_dump there should be a risk of getting a backup which will fail on restore. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small fix: avoid passing null pointers to memcpy()
On 2016-04-03 09:24, Piotr Stefaniak wrote: from running the regression test suite (including TAP tests) and also sqlsmith, I've got a couple of places where UBSan reported calls to memcpy() with null pointer passed as either source or destination. Patch attached. Patch updated. Since this time the patch includes fixes for other standard library function calls (memset and bsearch), I'm renaming the patch file to be more generic. commit 75e849e83c8f7d6b4caab13271b7b0fcf124d497 Author: Piotr StefaniakDate: Sat Apr 16 14:28:34 2016 +0200 Don't pass null pointers to functions that require the pointers to be non null. diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c index cfb3b50..600824a 100644 --- a/contrib/pgcrypto/px.c +++ b/contrib/pgcrypto/px.c @@ -180,7 +180,7 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen, memset(ivbuf, 0, ivs); if (ivlen > ivs) memcpy(ivbuf, iv, ivs); - else + else if (ivlen > 0) memcpy(ivbuf, iv, ivlen); } diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 29fd31a..2ed56ab 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -305,7 +305,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) /* * copy the scan key, if appropriate */ - if (key != NULL) + if (key != NULL && scan->rs_nkeys > 0) memcpy(scan->rs_key, key, scan->rs_nkeys * sizeof(ScanKeyData)); /* diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 7e37331..e7599be 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -4872,8 +4872,9 @@ SerializeTransactionState(Size maxsize, char *start_address) { if (TransactionIdIsValid(s->transactionId)) workspace[i++] = s->transactionId; - memcpy([i], s->childXids, - s->nChildXids * sizeof(TransactionId)); + if (s->nChildXids > 0) + memcpy([i], s->childXids, + s->nChildXids * sizeof(TransactionId)); i += s->nChildXids; } Assert(i == nxids); diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index b4dc617..a72795c 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1563,9 +1563,12 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn) /* copy running xacts */ sz = sizeof(TransactionId) * builder->running.xcnt_space; - memcpy(ondisk_c, builder->running.xip, sz); - COMP_CRC32C(ondisk->checksum, ondisk_c, sz); - ondisk_c += sz; + if (sz > 0) + { + memcpy(ondisk_c, builder->running.xip, sz); + COMP_CRC32C(ondisk->checksum, ondisk_c, sz); + ondisk_c += sz; + } /* copy committed xacts */ sz = sizeof(TransactionId) * builder->committed.xcnt; diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 8aa1f49..25ac26f 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -506,12 +506,14 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid, CurrentSnapshot->xmax = sourcesnap->xmax; CurrentSnapshot->xcnt = sourcesnap->xcnt; Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount()); - memcpy(CurrentSnapshot->xip, sourcesnap->xip, - sourcesnap->xcnt * sizeof(TransactionId)); + if (sourcesnap->xcnt > 0) + memcpy(CurrentSnapshot->xip, sourcesnap->xip, + sourcesnap->xcnt * sizeof(TransactionId)); CurrentSnapshot->subxcnt = sourcesnap->subxcnt; Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount()); - memcpy(CurrentSnapshot->subxip, sourcesnap->subxip, - sourcesnap->subxcnt * sizeof(TransactionId)); + if (sourcesnap->subxcnt > 0) + memcpy(CurrentSnapshot->subxip, sourcesnap->subxip, + sourcesnap->subxcnt * sizeof(TransactionId)); CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed; CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery; /* NB: curcid should NOT be copied, it's a local matter */ diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 0e815a9..0f26bd8 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1639,6 +1639,7 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple) static bool TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num) { + Assert(xip != NULL); return bsearch(, xip, num, sizeof(TransactionId), xidComparator) != NULL; } @@ -1675,7 +1676,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot, return false; } /* check if it's one of our txids, toplevel is also in there */ - else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt)) + else if (snapshot->subxcnt > 0 && TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt)) { bool resolved; CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple); @@ -1717,7 +1718,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
On Thu, Apr 14, 2016 at 10:22 AM, Peter Geogheganwrote: > > On Tue, Apr 12, 2016 at 11:38 AM, Andres Freund wrote: > >> And, on the other hand, if we don't do something like that, it will be > >> quite an exceptional case to find anything on the free list. Doing it > >> just to speed up developer benchmarking runs seems like the wrong > >> idea. > > > > I don't think it's just developer benchmarks. I've seen a number of > > customer systems where significant portions of shared buffers were > > unused due to this. > > > > Unless you have an OLTP system, you can right now easily end up in a > > situation where, after a restart, you'll never fill shared_buffers. > > Just because sequential scans for OLAP and COPY use ringbuffers. It sure > > isn't perfect to address the problem while there's free space in s_b, > > but it sure is better than to just continue to have significant portions > > of s_b unused. > > I agree that the ringbuffer heuristics are rather unhelpful in many > real-world scenarios. This is definitely a real problem that we should > try to solve soon. > > An adaptive strategy based on actual cache pressure in the recent past > would be better. Maybe that would be as simple as not using a > ringbuffer based on simply not having used up all of shared_buffers > yet. That might not be good enough, but it would probably still be > better than what we have. > I think that such a strategy could be helpful in certain cases, but not sure every time using it can be beneficial. There could be cases where we extend ring buffers to use unused buffers in shared buffer pool for bulk processing workloads and immediately after that there is a demand for buffers from other statements. Not sure, but I think an idea of different kind of buffer pools can be helpful for some such cases. Different kind of buffer pools could be ring buffers, extended ring buffers (relations associated with such buffer pools can bypass ring buffers and use unused shared buffers), retain or keep buffers (relations that are frequently accessed can be associated with this kind of buffer pool where buffers can stay for longer time) and a default buffer pool (all relations by default will be associated with default buffer pool where the behaviour will be same as current). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Fri, Apr 15, 2016 at 11:30 AM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > > At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapilawrote : > > > > How about if we do all the parsing stuff in temporary context and then copy > > the results using TopMemoryContext? I don't think it will be a leak in > > TopMemoryContext, because next time we try to check/assign s_s_names, it > > will free the previous result. > > I agree with you. A temporary context for the parser seems > reasonable. TopMemoryContext is created very early in main() so > palloc on it is effectively the same with malloc. > > One problem is that only the top memory block is assumed to be > free()'d, not pfree()'d by guc_set_extra. It makes this quite > ugly.. > + newconfig = (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData)); Is there a reason to use malloc here, can't we use palloc directly? Also for both the functions SyncRepCopyConfig() and SyncRepFreeConfig(), if we directly use TopMemoryContext inside the function (if required) rather than taking it as argument, then it will simplify the code a lot. +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt) Do we really need 'bool itself' parameter in above function? + if (cxt) + oldcxt = MemoryContextSwitchTo(cxt); + list_free_deep(config->members); + + if(oldcxt) + MemoryContextSwitchTo(oldcxt); Why do you need MemoryContextSwitchTo for freeing members? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com