Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 7, 2016 at 3:02 PM, Tomas Vondrawrote: > > I got access to a large machine with 72/144 cores (thanks to Oleg and > Alexander from Postgres Professional), and I'm running the tests on that > machine too. > > Results from Dilip's workload (with scale 300, unlogged tables) look like > this: > > 32 64128 192224 256288 > master104943 128579 72167 100967 66631 97088 63767 > granular-locking 103415 141689 83780 120480 71847 115201 67240 > group-update 105343 144322 92229 130149 81247 126629 76638 > no-content-lock 103153 140568 80101 119185 70004 115386 66199 > > So there's some 20-30% improvement for >= 128 clients. > So here we see performance improvement starting at 64 clients, this is somewhat similar to what Dilip saw in his tests. > But what I find much more intriguing is the zig-zag behavior. I mean, 64 > clients give ~130k tps, 128 clients only give ~70k but 192 clients jump up > to >100k tps again, etc. > No clear answer. > FWIW I don't see any such behavior on pgbench, and all those tests were done > on the same cluster. > >>> With 4.5.5, results for the same benchmark look like this: >>> >>>64128192 >>> >>> master 35693 39822 42151 >>> granular-locking 35370 39409 41353 >>> no-content-lock36201 39848 42407 >>> group-update 35697 39893 42667 >>> >>> That seems like a fairly bad regression in kernel, although I have not >>> identified the feature/commit causing it (and it's also possible the >>> issue >>> lies somewhere else, of course). >>> >>> With regular pgbench, I see no improvement on any kernel version. For >>> example on 3.19 the results look like this: >>> >>>64128192 >>> >>> master 54661 61014 59484 >>> granular-locking 55904 62481 60711 >>> no-content-lock56182 62442 61234 >>> group-update 55019 61587 60485 >>> >> >> Are the above results with synchronous_commit=off? >> > > No, but I can do that. > >>> I haven't done much more testing (e.g. with -N to eliminate >>> collisions on branches) yet, let's see if it changes anything. >>> >> >> Yeah, let us see how it behaves with -N. Also, I think we could try >> at higher scale factor? >> > > Yes, I plan to do that. In total, I plan to test combinations of: > > (a) Dilip's workload and pgbench (regular and -N) > (b) logged and unlogged tables > (c) scale 300 and scale 3000 (both fits into RAM) > (d) sync_commit=on/off > sounds sensible. Thanks for doing the tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Sun, Sep 11, 2016 at 11:05 AM, Peter Geogheganwrote: > So, while there are still a few loose ends with this revision (it > should still certainly be considered WIP), I wanted to get a revision > out quickly because V1 has been left to bitrot for too long now, and > my schedule is very full for the next week, ahead of my leaving to go > on vacation (which is long overdue). Hopefully, I'll be able to get > out a third revision next Saturday, on top of the > by-then-presumably-committed new tape batch memory patch from Heikki, > just before I leave. I'd rather leave with a patch available that can > be cleanly applied, to make review as easy as possible, since it > wouldn't be great to have this V2 with bitrot for 10 days or more. Heikki committed his preload memory patch a little later than originally expected, 4 days ago. I attach V3 of my own parallel CREATE INDEX patch, which should be applied on top of a today's git master (there is a bugfix that reviewers won't want to miss -- commit b56fb691). I have my own test suite, and have to some extent used TDD for this patch, so rebasing was not so bad . My tests are rather rough and ready, so I'm not going to post them here. (Changes in the WaitLatch() API also caused bitrot, which is now fixed.) Changes from V2: * Since Heikki eliminated the need for any extra memtuple "slots" (memtuples is now only exactly big enough for the initial merge heap), an awful lot of code could be thrown out that managed sizing memtuples in the context of the parallel leader (based on trends seen in parallel workers). I was able to follow Heikki's example by eliminating code for parallel sorting memtuples sizing. Throwing this code out let me streamline a lot of stuff within tuplesort.c, which is cleaned up quite a bit. * Since this revision was mostly focused on fixing up logtape.c (rebasing on top of Heikki's work), I also took the time to clarify some things about how an block-based offset might need to be applied within the leader. Essentially, outlining how and where that happens, and where it doesn't and shouldn't happen. (An offset must sometimes be applied to compensate for difference in logical BufFile positioning (leader/worker differences) following leader's unification of worker tapesets into one big tapset of its own.) * max_parallel_workers_maintenance now supersedes the use of the new parallel_workers index storage parameter. This matches existing heap storage parameter behavior, and allows the implementation to add practically no cycles as compared to master branch when the use of parallelism is disabled by setting max_parallel_workers_maintenance to 0. * New additions to the chapter in the documentation that Robert added a little while back, "Chapter 15. Parallel Query". It's perhaps a bit of a stretch to call this feature part of parallel query, but I think that it works reasonably well. The optimizer does determine the number of workers needed here, so while it doesn't formally produce a query plan, I think the implication that it does is acceptable for user-facing documentation. (Actually, it would be nice if you really could EXPLAIN utility commands -- that would be a handy place to show information about how they were executed.) Maybe this new documentation describes things in what some would consider to be excessive detail for users. The relatively detailed information added on parallel sorting seemed to be in the pragmatic spirit of the new chapter 15, so I thought I'd see what people thought. Work is still needed on: * Cost model. Should probably attempt to guess final index size, and derive calculation of number of workers from that. Also, I'm concerned that I haven't given enough thought to the low end, where with default settings most CREATE INDEX statements will use at least one parallel worker. * The whole way that I teach nbtsort.c to disallow catalog tables for parallel CREATE INDEX due to concerns about parallel safety is in need of expert review, preferably from Robert. It's complicated in a way that relies on things happening or not happening from a distance. * Heikki seems to want to change more about logtape.c, and its use of indirection blocks. That may evolve, but for now I can only target the master branch. * More extensive performance testing. I think that this V3 is probably the fastest version yet, what with Heikki's improvements, but I haven't really verified that. -- Peter Geoghegan 0001-Cap-the-number-of-tapes-used-by-external-sorts.patch.gz Description: GNU Zip compressed data 0002-Add-parallel-B-tree-index-build-sorting.patch.gz Description: GNU Zip compressed data 0003-Add-force_btree_randomaccess-GUC-for-testing.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Remove -Wl,-undefined,dynamic_lookup in macOS build.
Robert Haaswrites: > On Fri, Oct 7, 2016 at 7:14 PM, Tom Lane wrote: >> BTW, OS X hasn't got libintl AFAICT: >> What are you using to get past that? > MacPorts. OK, I'll take a look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work
On 2016-10-07 17:12:45 -0400, Tom Lane wrote: > Andres Freundwrites: > > It's not quite there yet, unfortunately. At the moment > > pg_atomic_write_u32() is used for local buffers - and we explicitly > > don't want that to be locking for temp buffers > > (c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307). > > Hm. > > > Don't really have a great idea about addressing this, besides either > > just living with the lock for temp buffers on fallback platforms (which > > don't have much of a practical relevance), or introduce > > pg_atomic_unlocked_write_u32() or something. Neither seems great. > > Maybe we could hack it with some macro magic that would cause > pg_atomic_write_u32() to be expanded into a simple assignment in > localbuf.c only? I think it's just as well to add a variant that's globally documented to have no locking, there might be further uses of it. It's already in two files (bufmgr.c/localbuf.c), and I don't think it's impossible that further usages will crop up. Patch that I intend to push soon-ish attached. Andres >From b0779abb3add11d4dd745779dd81ea8ecdd00a1d Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 7 Oct 2016 16:55:15 -0700 Subject: [PATCH] Fix fallback implementation of pg_atomic_write_u32(). I somehow had assumed that in the spinlock (in turn possibly using semaphores) based fallback atomics implementation 32 bit writes could be done without a lock. As far as the write goes that's correct, since postgres supports only platforms with single-copy atomicity for aligned 32bit writes. But writing without holding the spinlock breaks read-modify-write operations like pg_atomic_compare_exchange_u32(), since they'll potentially "miss" a concurrent write, which can't happen in actual hardware implementations. In 9.6+ when using the fallback atomics implementation this could lead to buffer header locks not being properly marked as released, and potentially some related state corruption. I don't see a related danger in 9.5 (earliest release with the API), because pg_atomic_write_u32() wasn't used in a concurrent manner there. The state variable of local buffers, before this change, were manipulated using pg_atomic_write_u32(), to avoid unnecessary synchronization overhead. As that'd not be the case anymore, introduce and use pg_atomic_unlocked_write_u32(), which does not correctly interact with RMW operations. This bug only caused issues when postgres is compiled on platforms without atomics support (i.e. no common new platform), or when compiled with --disable-atomics, which explains why this wasn't noticed in testing. Reported-By: Tom Lane Discussion: <14947.1475690...@sss.pgh.pa.us> Backpatch: 9.5-, where the atomic operations API was introduced. --- src/backend/port/atomics.c| 13 + src/backend/storage/buffer/bufmgr.c | 6 +++--- src/backend/storage/buffer/localbuf.c | 16 src/include/port/atomics.h| 25 +++-- src/include/port/atomics/fallback.h | 3 +++ src/include/port/atomics/generic.h| 9 + src/include/storage/buf_internals.h | 3 ++- 7 files changed, 61 insertions(+), 14 deletions(-) diff --git a/src/backend/port/atomics.c b/src/backend/port/atomics.c index 42169a3..d5970e4 100644 --- a/src/backend/port/atomics.c +++ b/src/backend/port/atomics.c @@ -104,6 +104,19 @@ pg_atomic_init_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val_) ptr->value = val_; } +void +pg_atomic_write_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val) +{ + /* + * One might think that an unlocked write doesn't need to acquire the + * spinlock, but one would be wrong. Even an unlocked write has to cause a + * concurrent pg_atomic_compare_exchange_u32() (et al) to fail. + */ + SpinLockAcquire((slock_t *) >sema); + ptr->value = val; + SpinLockRelease((slock_t *) >sema); +} + bool pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 *expected, uint32 newval) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 2b63cd3..df4c9d7 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -821,7 +821,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, Assert(buf_state & BM_VALID); buf_state &= ~BM_VALID; - pg_atomic_write_u32(>state, buf_state); + pg_atomic_unlocked_write_u32(>state, buf_state); } else { @@ -941,7 +941,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, uint32 buf_state = pg_atomic_read_u32(>state); buf_state |= BM_VALID; - pg_atomic_write_u32(>state, buf_state); + pg_atomic_unlocked_write_u32(>state, buf_state); } else { @@ -3167,7 +3167,7 @@ FlushRelationBuffers(Relation rel) false); buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED); -pg_atomic_write_u32(>state, buf_state); +
Re: [HACKERS] [COMMITTERS] pgsql: Remove -Wl,-undefined,dynamic_lookup in macOS build.
On Fri, Oct 7, 2016 at 7:14 PM, Tom Lanewrote: > I wrote: >> Robert Haas writes: >>> This broke the build for me. OS X Yosemite, 10.10.5. > >> Hm, probably means we need an explicit reference to -lintl when >> linking libpqwalreceiver.so. > > BTW, OS X hasn't got libintl AFAICT: > > # ./configure --enable-nls > ... > checking for library containing bind_textdomain_codeset... no > configure: error: a gettext implementation is required for NLS > > What are you using to get past that? MacPorts. [rhaas pgsql]$ port contents gettext | grep libintl /opt/local/include/libintl.h /opt/local/lib/libintl.8.dylib /opt/local/lib/libintl.a /opt/local/lib/libintl.dylib /opt/local/share/gettext/intl/libintl.rc /opt/local/share/gettext/libintl.jar [rhaas pgsql]$ -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Remove -Wl,-undefined,dynamic_lookup in macOS build.
I wrote: > Robert Haaswrites: >> This broke the build for me. OS X Yosemite, 10.10.5. > Hm, probably means we need an explicit reference to -lintl when > linking libpqwalreceiver.so. BTW, OS X hasn't got libintl AFAICT: # ./configure --enable-nls ... checking for library containing bind_textdomain_codeset... no configure: error: a gettext implementation is required for NLS What are you using to get past that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Remove -Wl,-undefined,dynamic_lookup in macOS build.
Robert Haaswrites: > On Wed, Oct 5, 2016 at 11:04 PM, Tom Lane wrote: >> Remove -Wl,-undefined,dynamic_lookup in macOS build. > This broke the build for me. OS X Yosemite, 10.10.5. Hm, probably means we need an explicit reference to -lintl when linking libpqwalreceiver.so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Remove -Wl,-undefined,dynamic_lookup in macOS build.
On Wed, Oct 5, 2016 at 11:04 PM, Tom Lanewrote: > Remove -Wl,-undefined,dynamic_lookup in macOS build. > > We don't need this anymore, and it prevents build-time error checking > that's usually good to have, so remove it. Undoes one change of commit > cac765820. > > Unfortunately, it's much harder to get a similar effect on other common > platforms, because we don't want the linker to throw errors for symbols > that will be resolved in the core backend. Only macOS and AIX expect the > core backend executable to be available while linking loadable modules, > so only these platforms can usefully throw errors for unresolved symbols > at link time. > > Discussion: <2652.1475512...@sss.pgh.pa.us> This broke the build for me. OS X Yosemite, 10.10.5. $ ./configure --enable-nls --with-libraries=/opt/local/lib --with-includes=/opt/local/include $ make ... gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 -bundle -multiply_defined suppress -o libpqwalreceiver.so libpqwalreceiver.o -L../../../../src/port -L../../../../src/common -L/opt/local/lib -Wl,-dead_strip_dylibs -L../../../../src/interfaces/libpq -lpq -bundle_loader ../../../../src/backend/postgres Undefined symbols for architecture x86_64: "_libintl_gettext", referenced from: _libpqrcv_get_conninfo in libpqwalreceiver.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [libpqwalreceiver.so] Error 1 make[1]: *** [all-backend/replication/libpqwalreceiver-recurse] Error 2 make: *** [all-src-recurse] Error 2 Without --enable-nls, or if I back up one commit, it works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing inheritance merge behavior in ALTER TABLE ADD CONSTRAINT
On Fri, Oct 7, 2016 at 5:09 PM, Tom Lanewrote: > What seems like a saner answer to me is to change the backend so that it > will accept these ALTER commands in either order, with the same end state. > The reason it throws an error now, IMO, is just so that blindly issuing > the same ALTER ADD CONSTRAINT twice will fail. But we could deal with > that by saying that it's okay as long as the initially-targeted constraint > doesn't already have conislocal = true. > +1. Been bitten by this one myself.
Re: [HACKERS] Radix tree for character conversion
On 10/07/2016 06:55 PM, Robert Haas wrote: On Fri, Oct 7, 2016 at 6:46 AM, Heikki Linnakangaswrote: Ouch. We should find and document an authoritative source for all the mappings we have... I think the next steps here are: 1. Find an authoritative source for all the existing mappings. 2. Generate the radix tree files directly from the authoritative sources, instead of the existing *.map files. 3. Completely replace the existing binary-search code with this. It might be best to convert using the existing map files, and then update the mappings later. Otherwise, when things break, you won't know what to blame. I was thinking that we keep the mappings unchanged, but figure out where we got the mappings we have. An authoritative source may well be "file X from unicode, with the following tweaks: ...". As long as we have some way of representing that, in text files, or in perl code, that's OK. What I don't want is that the current *.map files are turned into the authoritative source files, that we modify by hand. There are no comments in them, for starters, which makes hand-editing cumbersome. It seems that we have edited some of them by hand already, but we should rectify that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work
Andres Freundwrites: > It's not quite there yet, unfortunately. At the moment > pg_atomic_write_u32() is used for local buffers - and we explicitly > don't want that to be locking for temp buffers > (c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307). Hm. > Don't really have a great idea about addressing this, besides either > just living with the lock for temp buffers on fallback platforms (which > don't have much of a practical relevance), or introduce > pg_atomic_unlocked_write_u32() or something. Neither seems great. Maybe we could hack it with some macro magic that would cause pg_atomic_write_u32() to be expanded into a simple assignment in localbuf.c only? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fixing inheritance merge behavior in ALTER TABLE ADD CONSTRAINT
I've been looking into the misbehavior complained of here: https://www.postgresql.org/message-id/CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com I originally thought this was pg_dump's fault, but I now think it's not, or at least that changing the backend's behavior would be a superior solution. The problem can be described thus: create table parent(f1 int); create table child() inherits(parent); alter table parent add constraint inh_check check(f1 > 0) not valid; alter table child add constraint inh_check check(f1 > 0) not valid; The above sequence fails with ERROR: constraint "inh_check" for relation "child" already exists However, if you swap the order of the two ALTER ADD commands, it succeeds, with the second ALTER instead saying NOTICE: merging constraint "inh_check" with inherited definition In that case you end up with a child constraint marked as having both local origin (conislocal=true) and inherited origin (coninhcount=1). This is the situation that Benedikt's example has, and the problem is for pg_dump to restore this state. Now, pg_dump needs to issue separate commands along these lines to restore NOT VALID constraints, because it has to copy data into the tables before adding the constraints. (Otherwise, if any rows not satisfying the constraint are still in the table, the restore would fail.) The problem from pg_dump's viewpoint is that there's nothing particularly guaranteeing that the two ALTERs will be issued in the "right" order. Absent other considerations, it'll tend to issue the ALTERs in alphabetical order, which means dumping this example exactly as given will work, but not so much if the table names are like "foo" and "foo_child". We could possibly try to fix this by adding dependencies, but the dependencies would have to say that the parent table's constraint depends on the child table's constraint, which seems pretty weird. What seems like a saner answer to me is to change the backend so that it will accept these ALTER commands in either order, with the same end state. The reason it throws an error now, IMO, is just so that blindly issuing the same ALTER ADD CONSTRAINT twice will fail. But we could deal with that by saying that it's okay as long as the initially-targeted constraint doesn't already have conislocal = true. While I was poking at this, I came across a second problem in the same area, which is that this succeeds: alter table child add constraint inh_check check(f1 > 0) not valid; alter table parent add constraint inh_check check(f1 > 0); After that, you have a parent constraint that claims to be valid and inheritable, but nonetheless there might be rows in a child table that don't meet the constraint. That is BAD. It would break planner deductions that depend on believing that check constraints hold for all rows emitted by an inherited table scan. (I'm not certain whether there are any affected cases right now, but it's certainly plausible that there might be such in future.) Whatever you think of the other situation, we need to disallow this. The attached proposed patch (sans test cases as yet) deals with both of these things, and doesn't change the results of any existing regression test cases. I'm inclined to treat this as a bug and back-patch it as far as we allow NOT VALID check constraints, which seems to be 9.2. Comments? regards, tom lane diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index dbd6094..ea06a57 100644 *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *** static void StoreConstraints(Relation re *** 105,110 --- 105,111 bool is_internal); static bool MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr, bool allow_merge, bool is_local, + bool is_initially_valid, bool is_no_inherit); static void SetRelationNumChecks(Relation rel, int numchecks); static Node *cookConstraint(ParseState *pstate, *** AddRelationNewConstraints(Relation rel, *** 2301,2306 --- 2302,2308 */ if (MergeWithExistingConstraint(rel, ccname, expr, allow_merge, is_local, + cdef->initially_valid, cdef->is_no_inherit)) continue; } *** AddRelationNewConstraints(Relation rel, *** 2389,2394 --- 2391,2397 static bool MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr, bool allow_merge, bool is_local, + bool is_initially_valid, bool is_no_inherit) { bool found; *** MergeWithExistingConstraint(Relation rel *** 2436,2457 if (equal(expr, stringToNode(TextDatumGetCString(val found = true; } if (!found || !allow_merge) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT), errmsg("constraint \"%s\" for relation \"%s\" already exists", ccname, RelationGetRelationName(rel; ! tup =
Re: [HACKERS] Illegal SJIS mapping
On 09/07/2016 09:50 AM, Kyotaro HORIGUCHI wrote: Hi, I found an useless entry in utf8_to_sjis.map {0xc19c, 0x815f}, which is apparently illegal as UTF-8 which postgresql deliberately refuses. So it should be removed and the attached patch does that. 0x815f(SJIS) is also mapped from 0xefbcbc(U+FF3C FULLWIDTH REVERSE SOLIDUS) and it is a right mapping. Yes, I think you're right. Committed, thanks! By the way, the file comment at the beginning of UCS_to_SJIS.pl is the following. # Generate UTF-8 <--> SJIS code conversion tables from # map files provided by Unicode organization. # Unfortunately it is prohibited by the organization # to distribute the map files. So if you try to use this script, # you have to obtain SHIFTJIS.TXT from # the organization's ftp site. The file was found at the following place thanks to google. ftp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/ As the URL is showing, or as written in the file Public/MAPPINGS/EASTASIA/ReadMe.txt, it is already obsolete and the *live* definition *may* be found in Unicode Character Database. But I haven't found SJIS-related informatin there. > If I'm not missing anything, the only available authority would be JIS X 0208/0213 but what should be implmented seems to be maybe-modified MS932 for which I don't know the authority. Anyway I ran UCS_to_SJIS.pl with the SHIFTJIS.TXT above and I got a quite different mapping files from the current ones. So, I wonder how the mappings related to SJIS (and/or EUC-JP) are maintained. If no authoritative information is available, the generating script no longer usable. If any other autority is choosed, it is to be modified according to whatever the new source format is. The script is clearly intended to read CP932.TXT, rather than SHIFTJIS.TXT, despite the comments in it. CP932.TXT can be found at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT However, running the script with that doesn't produce exactly what we have in utf8_to_sjis.map, either. It's otherwise same, but we have some extra mappings: - {0xc2a5, 0x5c}, - {0xc2ac, 0x81ca}, - {0xe28096, 0x8161}, - {0xe280be, 0x7e}, - {0xe28892, 0x817c}, - {0xe3809c, 0x8160}, Those mappings were added in commit a8bd7e1c6e026678019b2f25cffc0a94ce62b24b, back in 2002. The bogus mapping for the invalid 0xc19c UTF-8 byte sequence was also added by that commit, as well a few valid mappings that UCS_to_SJIS.pl also produces. I can't judge if those mappings make sense. If we can't find an authoritative source for them, I suggest that we leave them as they are, but also hard-code them to UCS_to_SJIS.pl, so that running that script produces those mappings in utf8_to_sjis.map, even though they are not present in the CP932.TXT source file. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
On Fri, Oct 7, 2016 at 11:51 AM, Jeff Janeswrote: > What happens if you turn fsync off? Once a xlog file is fully written, it > is immediately fsynced, even if the backend is holding WALWriteLock or > wal_insert (or both) at the time, and even if synchrounous_commit is off. > Assuming this machine has a BBU so that it doesn't have to wait for disk > rotation, still fsyncs are expensive because the kernel has to find all the > data and get it sent over to the BBU, while holding locks. Scale factor 300, 32 clients, fsync=off: 5 Lock| tuple 18 LWLockTranche | lock_manager 24 LWLockNamed | WALWriteLock 88 LWLockTranche | buffer_content 265 LWLockTranche | wal_insert 373 LWLockNamed | CLogControlLock 496 LWLockNamed | ProcArrayLock 532 Lock| extend 540 LWLockNamed | XidGenLock 545 Lock| transactionid 27067 Client | ClientRead 85364 | But I'm not sure you're right about the way the fsync=off code works. I think pg_fsync()/pg_fsync_writethrough()/pg_fsync_no_writethrough() look at enableFsync and just do nothing if it's false. >> Second, ClientRead becomes a bigger and bigger issue as the number of >> clients increases; by 192 clients, it appears in 45% of the samples. >> That basically means that pgbench is increasingly unable to keep up >> with the server; for whatever reason, it suffers more than the server >> does from the increasing lack of CPU resources. > > I would be careful about that interpretation. If you asked pgbench, it > would probably have the opposite opinion. Yeah, that's possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
On Fri, Oct 7, 2016 at 1:39 PM, Andres Freundwrote: > On 2016-10-06 14:38:56 -0400, Robert Haas wrote: >> Obviously, there's a vast increase in TPS, and the backends seem to >> spend most of their time actually doing work. ClientRead is now the >> overwhelmingly dominant wait event, although wal_insert and >> WALWriteLock contention is clearly still a significant problem. >> Contention on other locks is apparently quite rare. Notice that >> client reads are really significant here - more than 20% of the time >> we sample what a backend is doing, it's waiting for the next query. >> It seems unlikely that performance on this workload can be improved >> very much by optimizing anything other than WAL writing, because no >> other wait event appears in more than 1% of the samples. > > I don't think you meant that, but this sounds a bit like that there's > nothing to do performance-wise - I don't think that's true. There's no > significant lock-contention, yes. But that obviously doesn't mean we can > use resources more efficiently. Yeah, right. Doing the same stuff faster will surely help. It just doesn't look like we can get anywhere by improving the locking, unless it's something related to WAL writing. >> Second, ClientRead becomes a bigger and bigger issue as the number of >> clients increases; by 192 clients, it appears in 45% of the samples. >> That basically means that pgbench is increasingly unable to keep up >> with the server; for whatever reason, it suffers more than the server >> does from the increasing lack of CPU resources. > > Isn't that more a question of pgbench threads beeing preemted, so server > processes can run? Since there's not enough hardware threads for both > servers and clients to always run, you *have* to switch between them. > Usually a read from the client after processing a query will cause a > directed wakeup of the other thread (which is why it's not a very > frequently observed state), but if the queue of to-be-run tasks is very > long that'll not happen. Yeah, maybe I shouldn't have said it suffers more than the server, but rather along with the server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CVE-2016-1238 fix breaks (at least) pg_rewind tests
On 09/12/2016 11:08 AM, Michael Paquier wrote: On Fri, Sep 9, 2016 at 6:49 AM, Andres Freundwrote: On 2016-09-08 18:13:06 -0300, Alvaro Herrera wrote: I can't vouch for the windows stuff, and the invocations indeed look vulnerable. I'm not sure if the fix actually matters on windows, given . is the default for pretty much everything there. Well, maybe it doesn't matter now but as I understand the fix is going to enter the next stable upstream perl, so it'll fail eventually. It'd be saner to just fix the thing completely so that we can forget about it. Yea, it'd need input from somebody on windows. Michael? What happens if you put a line remove . from INC (like upthread) in the msvc stuff? I haven't tested that directly because I am not sure how to enforce INC when running the prove command through system(), and there is no point to use pop on @INC directly in vcregress.pl as that would just be ignored in the system() call. But to be short: this will blow up as @INC is part of the default per perl -V if one uses active perl. So it looks fair to me to append the local source directory as well to what is included. You can actually do that by just adding $dir to $ENV{PERL5LIB} in vcregress.pl and that would be fine. I committed a fix for the unix Makefile, per Andres' original suggestion, because this started to bother me. I tried reproducing this on Windows by hacking Prove.pm to remove '.' from @INC, but I wasn't able to make that to fail. So I didn't do anything about MSVC for now. Let's fix vcregress.pl, when someone reports an actual problem on Windows. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work
On 2016-10-06 00:06:33 -0400, Tom Lane wrote: > Andres Freundwrites: > > Hm. After a long battle of head vs. wall I think I see what the problem > > is. For the fallback atomics implementation I somehow had assumed that > > pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked > > write. But that's not true, because it has to cause > > pg_atomic_compare_exchange_u32 to fail. > > Hah ... obvious once you see it. > > > For me the problem often takes a lot longer to reproduce (once only > > after 40min), could you run with the attached patch, and see whether > > that fixes things for you? > > For me, with the described test case, HEAD fails within a minute, > two times out of three or so. I've not reproduced it after half an > hour of beating on this patch. Looks good. It's not quite there yet, unfortunately. At the moment pg_atomic_write_u32() is used for local buffers - and we explicitly don't want that to be locking for temp buffers (c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307). Don't really have a great idea about addressing this, besides either just living with the lock for temp buffers on fallback platforms (which don't have much of a practical relevance), or introduce pg_atomic_unlocked_write_u32() or something. Neither seems great. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
This thread gets me thinking about the definition of "support." While support in practice seems to primarily relate to fixes/updates to the supported version itself it could just as well apply to interoperability support by newer versions. Given that the standard PostgreSQL upgrade process involves upgrading clients first and using pg_dump from the newer version, it is reasonable to assume that the clients/utilities for a given version would support interacting with any prior version that was not EOL at the time the new major version is released. In other words, 9.6 was released last month, the same month that 9.1 was EOL, so 9.6 clients should work with 9.1 through 9.6 servers but from my perspective there is no need to *guarantee* that 10 would do so. The standard caveats apply. A new version *might* work for an unsupported older version but no assurance is offered. This is effectively a 5-year upgrade "grace period" *after* the EOL date of a given version which seems plenty generous. Defining the term of backward compatibility support might be useful in the future when these types of questions arise. Cheers, Steve On Fri, Oct 7, 2016 at 9:06 AM, Tom Lanewrote: > Robert Haas writes: > > On Fri, Oct 7, 2016 at 11:34 AM, Tom Lane wrote: > >> Greg Stark writes: > >>> For another there may be binary-only applications or drivers out there > >>> that are using V2 for whatever reason. > > >> The problem with letting it just sit there is that we're not, in fact, > >> testing it. If we take the above argument seriously then we should > >> provide some way to configure libpq to prefer V2 and run regression > >> tests in that mode. Otherwise, if/when we break it, we'll never know it > >> till we get field reports. > > > I agree with that. I think it would be fine to keep V2 support if > > somebody wants to do the work to let us have adequate test coverage, > > but if nobody volunteers I think we might as well rip it out. I don't > > particularly enjoy committing things only to be told that they've > > broken something I can't test without unreasonable effort. > > When I wrote the above I was thinking of an essentially user-facing > libpq feature, similar to the one JDBC has, to force use of V2 protocol. > But actually, for testing purposes, I don't think that's what we want. > Any such feature would fail to exercise libpq's logic for falling back > from V3 to V2 when it connects to an old server, which is surely something > we'd like to test without actually having a pre-7.4 server at hand. > > So what I'm thinking is it'd be sufficient to do something like > this in pqcomm.h: > > +#ifndef FORCE_OLD_PROTOCOL > #define PG_PROTOCOL_LATEST PG_PROTOCOL(3,0) > +#else /* make like a pre-7.4 server for testing purposes */ > +#define PG_PROTOCOL_LATEST PG_PROTOCOL(2,0) > +#endif > > which would cause the server to reject 3.0 requests just as if it were > ancient. Then we could test with that #define, maybe have a buildfarm > critter doing it. (This might break pqmq.c though, so we might need > to work slightly harder than this.) > > Also, I realized while perusing this that the server still has support > for protocol 1.0 (!). That's *definitely* dead code. There's not much > of it, but still, I'd rather rip it out than continue to pretend it's > supported. > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] pgbench vs. wait events
On 10/7/16 10:42 AM, Andres Freund wrote: Hi, On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote: This contention on WAL reminds me of another scenario I've heard about that was similar. To fix things what happened was that anyone that the first person to block would be responsible for writing out all buffers for anyone blocked behind "him". We pretty much do that already. But while that's happening, the other would-be-writers show up as blocking on the lock. We don't use kind of an odd locking model for the waiters (LWLockAcquireOrWait()), which waits for the lock to be released, but doesn't try to acquire it afterwards. Instead the wal position is rechecked, and in many cases we'll be done afterwards, because enough has been written out. Greetings, Andres Freund Are the batched writes all done before fsync is called? Are you sure that A only calls fsync after flushing all the buffers from B, C, and D? Or will it fsync twice? Is there instrumentation to show that? I know there's a tremendous level of skill involved in this code, but simply asking in case there's some tricks. Another strategy that may work is actually intentionally waiting/buffering some few ms between flushes/fsync, for example, make sure that the number of flushes per second doesn't exceed some configurable amount because each flush likely eats at least one iop from the disk and there is a maximum iops per disk, so might as well buffer more if you're exceeding that iops count. You'll trade some latency, but gain throughput for doing that. Does this make sense? Again apologies if this has been covered. Is there a whitepaper or blog post or clear way I can examine the algorithm wrt locks/buffering for flushing WAL logs? -Alfred -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Realistically though, how much would code coverage info have helped? > Code coverage on a back branch would not have told you about whether > it leaves blobs behind in the final regression DB state. Code coverage > on HEAD might have helped you notice some aspects of this failure, but > it would not have told you about the same query failing before 7.4 > that worked later. The code coverage report is exactly what I was using to figure out what was being tested in pg_dump and what wasn't. Many of the tests that are included in the new TAP testing framework that I wrote for pg_dump were specifically to provide code coverage and did improve the report. If the regression tests in older versions were updated to make sure that all the capabilities of pg_dump in those versions were tested then my testing with the regression test databases would have shown that the newer version of pg_dump wasn't handling those cases correctly. That would require more comprehensive testing to be done in those back-branches though, which would require more than just the code coverage tool being included, that's true. Another approach to this would be to figure out a way for the newer testing framework in HEAD to be run against older versions, though we'd need to have a field which indicates which version of PG a given test should be run against as there are certainly tests of newer capabilities than older versions supported. Ultimately, I'm afraid we may have to just punt on the idea of this kind of testing being done using the same testing structure that exists in HEAD and is used in the buildfarm. That would be unfortunate, but I'm not quite sure how you could have a buildfarm member than runs every major version between 8.0 and HEAD and knows how to tell the HEAD build-system what all the ports are for all those versions to connect to and run tests against. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > > Stephen Frost wrote: > > > > > > > What would be really nice would be code coverage information for the > > > > back-branches also, as that would allow us to figure out what we're > > > > missing coverage for. I realize that we don't like adding new things to > > > > back-branches as those changes could impact packagers, but that might > > > > not impact them since that only runs when you run 'make coverage'. > > > > > > Hmm? 9.1 already has "make coverage", so there's nothing to backpatch. > > > Do you mean to backpatch that infrastructure even further back than > > > that? > > > > I wasn't sure how far back it went, but if it's only to 9.1, then yes, > > farther than that. Specifically, to as far back as we wish to provide > > support for pg_dump, assuming it's reasonable to do so. > > I said 9.1 because that's the oldest we support, but it was added in > 8.4. > > Do you really want to go back to applying patches back to 7.0? That's > brave. Hrm. My thought had actually been "back to whatever we decide we want pg_dump to support." The discussion about that seems to be trending towards 8.0 rather than 7.0, but you bring up an interesting point about if we actually want to back-patch things that far. I guess my thinking is that if we decide that 8.0 is the answer then we should at least be open to back-patching things that allow us to test that we are actually still supporting 8.0 and maybe that even means having a buildfarm member or two which checks back that far. > > > Or perhaps you are saying that coverage.pg.org should report results for > > > each branch separately? We could do that ... > > > > This would certainly be nice to have, but the first is more important. > > coverage.pg.org is nice to tell people "hey, here's where you can look > > to find what we aren't covering", but when you're actually hacking on > > code, you really want a much faster turn-around > > True. We could actually update things in coverage.postgresql.org much > faster, actually. Right now it's twice a day, but if we enlarge the > machine I'm sure we can do better (yes, we can do that pretty easily). > Also, to make it faster, we could install ccache 3.10 in that machine, > although that would be against our regular pginfra policy. > > At some point I thought about providing reports for each day, so that we > can see how it has improved over time, but that may be too much :-) > > > and you'd like that pre-commit too. > > Yeah, that's a good point. This is the real issue, imv, with coverage.pg.org. I still like having it, and having stats kept over time which allow us to see how we're doing over time when it comes to our code coverage would be nice, but the coverage.pg.org site isn't as useful for active development. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Alvaro Herrerawrites: > Stephen Frost wrote: >> I wasn't sure how far back it went, but if it's only to 9.1, then yes, >> farther than that. Specifically, to as far back as we wish to provide >> support for pg_dump, assuming it's reasonable to do so. > Do you really want to go back to applying patches back to 7.0? That's > brave. Branches before about 7.3 or 7.4 don't build cleanly on modern tools. In fact, they don't even build cleanly on my old HPUX 10.20 box ... I just tried, and they have problems with the bison and flex I have installed there now. As a data point, that bison executable bears a file date of Jan 31 2003. Andres reported something similar in the year-or-two-ago thread that was mentioned earlier. This doesn't even consider optional features; I wasn't trying to build SSL support for instance, but I'm pretty sure OpenSSL has been a moving target over that kind of time span. So I think trying to collect code coverage info on those branches is nuts. Maybe we could sanely do it for the later 8.x releases. Realistically though, how much would code coverage info have helped? Code coverage on a back branch would not have told you about whether it leaves blobs behind in the final regression DB state. Code coverage on HEAD might have helped you notice some aspects of this failure, but it would not have told you about the same query failing before 7.4 that worked later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
Hi, On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote: > This contention on WAL reminds me of another scenario I've heard about that > was similar. > > To fix things what happened was that anyone that the first person to block > would be responsible for writing out all buffers for anyone blocked behind > "him". We pretty much do that already. But while that's happening, the other would-be-writers show up as blocking on the lock. We don't use kind of an odd locking model for the waiters (LWLockAcquireOrWait()), which waits for the lock to be released, but doesn't try to acquire it afterwards. Instead the wal position is rechecked, and in many cases we'll be done afterwards, because enough has been written out. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
Hi, On 2016-10-06 14:38:56 -0400, Robert Haas wrote: > Obviously, there's a vast increase in TPS, and the backends seem to > spend most of their time actually doing work. ClientRead is now the > overwhelmingly dominant wait event, although wal_insert and > WALWriteLock contention is clearly still a significant problem. > Contention on other locks is apparently quite rare. Notice that > client reads are really significant here - more than 20% of the time > we sample what a backend is doing, it's waiting for the next query. > It seems unlikely that performance on this workload can be improved > very much by optimizing anything other than WAL writing, because no > other wait event appears in more than 1% of the samples. I don't think you meant that, but this sounds a bit like that there's nothing to do performance-wise - I don't think that's true. There's no significant lock-contention, yes. But that obviously doesn't mean we can use resources more efficiently. > Second, ClientRead becomes a bigger and bigger issue as the number of > clients increases; by 192 clients, it appears in 45% of the samples. > That basically means that pgbench is increasingly unable to keep up > with the server; for whatever reason, it suffers more than the server > does from the increasing lack of CPU resources. Isn't that more a question of pgbench threads beeing preemted, so server processes can run? Since there's not enough hardware threads for both servers and clients to always run, you *have* to switch between them. Usually a read from the client after processing a query will cause a directed wakeup of the other thread (which is why it's not a very frequently observed state), but if the queue of to-be-run tasks is very long that'll not happen. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
Hi, On 2016-10-06 18:15:58 -0400, Robert Haas wrote: > That's pretty tight, especially since I now notice Andres also left a > postmaster running on this machine back in April, with > shared_buffers=8GB. Oops, sorry for that. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Stephen Frostwrites: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> It might be a good idea to retroactively modify 9.1-9.3 so that there >> are some blobs in the final state, for purposes of testing pg_dump and >> pg_upgrade. > I certainly think that would be a good idea. I thought we had been > insisting on coverage via the regression tests for a lot farther back > than 9.4. though perhaps that was only for newer features and we never > went back and added it for existing capabilities. Well, there were regression tests for blobs for a long time, but they carefully cleaned up their mess. It was only in 70ad7ed4e that we made them leave some blobs behind. I took a quick look at back-patching that commit, but the test would need to be rewritten to not depend on features that don't exist further back (like \gset), which likely explains why I didn't do it at the time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Stephen Frost wrote: > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > Stephen Frost wrote: > > > > > What would be really nice would be code coverage information for the > > > back-branches also, as that would allow us to figure out what we're > > > missing coverage for. I realize that we don't like adding new things to > > > back-branches as those changes could impact packagers, but that might > > > not impact them since that only runs when you run 'make coverage'. > > > > Hmm? 9.1 already has "make coverage", so there's nothing to backpatch. > > Do you mean to backpatch that infrastructure even further back than > > that? > > I wasn't sure how far back it went, but if it's only to 9.1, then yes, > farther than that. Specifically, to as far back as we wish to provide > support for pg_dump, assuming it's reasonable to do so. I said 9.1 because that's the oldest we support, but it was added in 8.4. Do you really want to go back to applying patches back to 7.0? That's brave. > > Or perhaps you are saying that coverage.pg.org should report results for > > each branch separately? We could do that ... > > This would certainly be nice to have, but the first is more important. > coverage.pg.org is nice to tell people "hey, here's where you can look > to find what we aren't covering", but when you're actually hacking on > code, you really want a much faster turn-around True. We could actually update things in coverage.postgresql.org much faster, actually. Right now it's twice a day, but if we enlarge the machine I'm sure we can do better (yes, we can do that pretty easily). Also, to make it faster, we could install ccache 3.10 in that machine, although that would be against our regular pginfra policy. At some point I thought about providing reports for each day, so that we can see how it has improved over time, but that may be too much :-) > and you'd like that pre-commit too. Yeah, that's a good point. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > > What would be really nice would be code coverage information for the > > back-branches also, as that would allow us to figure out what we're > > missing coverage for. I realize that we don't like adding new things to > > back-branches as those changes could impact packagers, but that might > > not impact them since that only runs when you run 'make coverage'. > > Hmm? 9.1 already has "make coverage", so there's nothing to backpatch. > Do you mean to backpatch that infrastructure even further back than > that? I wasn't sure how far back it went, but if it's only to 9.1, then yes, farther than that. Specifically, to as far back as we wish to provide support for pg_dump, assuming it's reasonable to do so. > Or perhaps you are saying that coverage.pg.org should report results for > each branch separately? We could do that ... This would certainly be nice to have, but the first is more important. coverage.pg.org is nice to tell people "hey, here's where you can look to find what we aren't covering", but when you're actually hacking on code, you really want a much faster turn-around and you'd like that pre-commit too. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Stephen Frost wrote: > What would be really nice would be code coverage information for the > back-branches also, as that would allow us to figure out what we're > missing coverage for. I realize that we don't like adding new things to > back-branches as those changes could impact packagers, but that might > not impact them since that only runs when you run 'make coverage'. Hmm? 9.1 already has "make coverage", so there's nothing to backpatch. Do you mean to backpatch that infrastructure even further back than that? Or perhaps you are saying that coverage.pg.org should report results for each branch separately? We could do that ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frostwrites: > > Ugh. Thanks for fixing. I had tested back to 7.4 with the regression > > tests but either those didn't include blobs or something got changed > > after my testing and I didn't re-test all the way back when I should > > have. > > It looks like the final state of the regression tests doesn't include > any blobs before about 9.4. You wouldn't have seen any results worse > than a warning message in 7.4-8.4, unless there were some blobs so that > the data extraction loop got iterated. > > It might be a good idea to retroactively modify 9.1-9.3 so that there > are some blobs in the final state, for purposes of testing pg_dump and > pg_upgrade. I certainly think that would be a good idea. I thought we had been insisting on coverage via the regression tests for a lot farther back than 9.4. though perhaps that was only for newer features and we never went back and added it for existing capabilities. What would be really nice would be code coverage information for the back-branches also, as that would allow us to figure out what we're missing coverage for. I realize that we don't like adding new things to back-branches as those changes could impact packagers, but that might not impact them since that only runs when you run 'make coverage'. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Question / requests.
Robert Haas wrote: > On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte >wrote: > > On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: > >> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera > >> wrote: > > ... > >>> I wonder if the real answer isn't just to disallow -f with parallel > >>> vacuuming. > >> Seems like we should figure out which catalog tables are needed in > >> order to perform a VACUUM, and force those to be done last and one at > >> a time. > > > > Is the system catalog a bottleneck for people who has real use for > > paralell vacuum? > > I don't know, but it seems like the documentation for vacuumdb > currently says, more or less, "Hey, if you use -j with -f, it may not > work!", which seems unacceptable to me. It should be the job of the > person writing the feature to make it work in all cases, not the job > of the person using the feature to work around the problem when it > doesn't. The most interesting use case of vacuumdb is lazy vacuuming, I think, so committing that patch as it was submitted previously was a good step forward even if it didn't handle VACUUM FULL 100%. I agree that it's better to have both modes Just Work in parallel, which is the point of this subsequent patch. So let's move forward. I support Francisco's effort to make -f work with -j. I don't have a strong opinion on which of the various proposals presented so far is the best way to implement it, but let's figure that out and get it done. If you want to argue that vacuumdb -f -j not working properly is a bug in the vacuumdb -j commit, ISTM you're arguing that we should backpatch whatever we come up with as a bug fix, but I would disagree with that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
Le 03/10/2016 à 16:09, Christoph Berg a écrit : > Hi Gilles, > > I've just tried v4 of the patch. The OID you picked for > pg_current_logfile doesn't work anymore, but after increasing it > randomly by 1, it compiles. I like the added functionality, > especially that "select pg_read_file(pg_current_logfile());" just > works. I've changed the OID and some other things in this v5 of the patch, see bellow. > What bugs me is the new file "pg_log_file" in PGDATA. It clutters the > directory listing. I wouldn't know where else to put it, but you might > want to cross-check that with the thread that is trying to reshuffle > the directory layout to make it easier to exclude files from backups. > (Should this file be part of backups?) > > It's probably correct to leave the file around on shutdown (given it's > still a correct pointer). But there might be a case for removing it on > startup if logging_collector isn't active anymore. The file has been renamed into current_logfile and is now removed at startup if logging_collector is not active. The file can be excluded from a backup but otherwise if it is restored it will be removed or overridden at startup. Perhaps the file can give a useful information in a backup to know the last log file active at backup time, but not sure it has any interest. I'm not sure which thread is talking about reshuffling the directory layout, please give me a pointer if this is not the thread talking about renaming of pg_xlog and pg_clog. In the future if we have a directory to store files that must be excluded from backup or status files, it will be easy to move this file here. I will follow such a change. > Also, pg_log_file is tab-completion-unfriendly, it conflicts with > pg_log/. Maybe name it current_logfile? Right, done. > Another thing that might possibly be improved is csv logging: > > # select pg_read_file(pg_current_logfile()); > pg_read_file > ─── > LOG: ending log output to stderr↵ > HINT: Future log output will go to log destination "csvlog".↵ > > -rw--- 1 cbe staff 1011 Okt 3 15:06 postgresql-2016-10-03_150602.csv > -rw--- 1 cbe staff 96 Okt 3 15:06 postgresql-2016-10-03_150602.log > > ... though it's unclear what to do if both stderr and csvlog are > selected. > > Possibly NULL should be returned if only "syslog" is selected. > (Maybe remove pg_log_file once 'HINT: Future log output will go to > log destination "syslog".' is logged?) I've totally missed that we can have log_destination set to stderr and csvlog at the same time, so pg_current_logfile() might return two filenames in this case. I've changed the function to return a setof record to report the last stderr or csv log file or both. One another major change is that the current log filename list is also updated after a configuration reload and not just after a startup or a log rotation. So in the case of you are switching from stderr to csvlog or both you will see immediately the change in current_logfile instead of waiting for the next log rotation. * log_destination set to csvlog only: postgres=# select * from pg_current_logfile(); pg_current_logfile --- pg_log/postgresql-2016-10-07_1646.csv (1 row) * log_destination set to stderr only: postgres=# select pg_reload_conf(); pg_reload_conf t (1 row) postgres=# select * from pg_current_logfile(); pg_current_logfile --- pg_log/postgresql-2016-10-07_1647.log (1 row) * log_destination set to both stderr,csvlog: postgres=# select pg_reload_conf(); pg_reload_conf t (1 row) postgres=# select * from pg_current_logfile(); pg_current_logfile --- pg_log/postgresql-2016-10-07_1648.log pg_log/postgresql-2016-10-07_1648.csv (2 rows) When logging_collector is disabled, this function return null. As the return type has changed to a setof, the query to read the file need to be change too: postgres=# SELECT pg_read_file(log) FROM pg_current_logfile() b(log); pg_read_file LOG: duration: 0.182 ms statement: select pg_read_file(log) from pg_current_logfile() file(log);+ (1 row) I can change the return type to a single text[] if that's looks better. Thanks -- Gilles Darold Consultant PostgreSQL http://dalibo.com - http://dalibo.org diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index a588350..69a74af 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15480,6 +15480,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
Re: [HACKERS] Radix tree for character conversion
Robert Haaswrites: > On Fri, Oct 7, 2016 at 6:46 AM, Heikki Linnakangas wrote: >> Ouch. We should find and document an authoritative source for all the >> mappings we have... >> >> I think the next steps here are: >> >> 1. Find an authoritative source for all the existing mappings. >> 2. Generate the radix tree files directly from the authoritative sources, >> instead of the existing *.map files. >> 3. Completely replace the existing binary-search code with this. > It might be best to convert using the existing map files, and then > update the mappings later. Otherwise, when things break, you won't > know what to blame. I think I went through this exercise last year or so, and updated the notes about the authoritative sources where I was able to find one. In the remaining cases, I believe that the maps have been intentionally tweaked and we should be cautious about undoing that. Tatsuo-san might remember more about why they are the way they are. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
Robert Haaswrites: > On Fri, Oct 7, 2016 at 11:34 AM, Tom Lane wrote: >> Greg Stark writes: >>> For another there may be binary-only applications or drivers out there >>> that are using V2 for whatever reason. >> The problem with letting it just sit there is that we're not, in fact, >> testing it. If we take the above argument seriously then we should >> provide some way to configure libpq to prefer V2 and run regression >> tests in that mode. Otherwise, if/when we break it, we'll never know it >> till we get field reports. > I agree with that. I think it would be fine to keep V2 support if > somebody wants to do the work to let us have adequate test coverage, > but if nobody volunteers I think we might as well rip it out. I don't > particularly enjoy committing things only to be told that they've > broken something I can't test without unreasonable effort. When I wrote the above I was thinking of an essentially user-facing libpq feature, similar to the one JDBC has, to force use of V2 protocol. But actually, for testing purposes, I don't think that's what we want. Any such feature would fail to exercise libpq's logic for falling back from V3 to V2 when it connects to an old server, which is surely something we'd like to test without actually having a pre-7.4 server at hand. So what I'm thinking is it'd be sufficient to do something like this in pqcomm.h: +#ifndef FORCE_OLD_PROTOCOL #define PG_PROTOCOL_LATEST PG_PROTOCOL(3,0) +#else /* make like a pre-7.4 server for testing purposes */ +#define PG_PROTOCOL_LATEST PG_PROTOCOL(2,0) +#endif which would cause the server to reject 3.0 requests just as if it were ancient. Then we could test with that #define, maybe have a buildfarm critter doing it. (This might break pqmq.c though, so we might need to work slightly harder than this.) Also, I realized while perusing this that the server still has support for protocol 1.0 (!). That's *definitely* dead code. There's not much of it, but still, I'd rather rip it out than continue to pretend it's supported. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about pg_control usage
On Fri, Oct 7, 2016 at 8:24 AM, Anastasia Lubennikova < a.lubennik...@postgrespro.ru> wrote: > Hi, hackers! > > I am examining various global variables in ShmemVariableCache, > pg_control and pg_controldata. To force and debug xid wraparound, > I've implemented a function, that allows to set nextXid value in control > file manually, but it doesn't work as expected. > Why not use pg_resetxlog -x ? After an investigation I found that, while loading global variables in > StartupXlog() > we read checkPoint location from ControlFile, and then read all the > information > from the checkPoint record. And never ever use nextXid stored in > ControlFile. > > What is the purpose of having CheckPoint stored in control file then? > I thought, that it allows us to start up server after correct shutdown > without any xlog, as said in the comment in pg_control.h > > * Body of CheckPoint XLOG records. This is declared here because we keep > * a copy of the latest one in pg_control for possible disaster recovery. > * Changing this struct requires a PG_CONTROL_VERSION bump. > > But as far as I see, we never use this data. > > Could you please explain, why don't we use information from control file > in StartapXlog()? I believe pg_resetxlog uses the data from the control file to create an artificial checkpoint record, and injects that record into an artificial xlog file, which will allow you start the system. So it is a small use, but a sometimes very important one. Cheers, Jeff
Re: [HACKERS] Radix tree for character conversion
On Fri, Oct 7, 2016 at 6:46 AM, Heikki Linnakangaswrote: > Ouch. We should find and document an authoritative source for all the > mappings we have... > > I think the next steps here are: > > 1. Find an authoritative source for all the existing mappings. > 2. Generate the radix tree files directly from the authoritative sources, > instead of the existing *.map files. > 3. Completely replace the existing binary-search code with this. It might be best to convert using the existing map files, and then update the mappings later. Otherwise, when things break, you won't know what to blame. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind and ssl test suites don't work if '.' is not in @INC
On 10/07/2016 03:05 PM, Michael Paquier wrote: On Fri, Oct 7, 2016 at 8:54 PM, Heikki Linnakangaswrote: I'm a bit surprised no-one else has reported this yet. Have I missed a report? Andres has reported that a couple of weeks back, on Debian testing like you: https://www.postgresql.org/message-id/20160908204529.flg6nivjuwp5v...@alap3.anarazel.de Ah, thanks, I'll reply on that thread. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
On Thu, Oct 6, 2016 at 11:38 AM, Robert Haaswrote: > > Next, I tried lowering the scale factor to something that fits in > shared buffers. Here are the results at scale factor 300: > > 14 Lock| tuple > 22 LWLockTranche | lock_manager > 39 LWLockNamed | WALBufMappingLock > 331 LWLockNamed | CLogControlLock > 461 LWLockNamed | ProcArrayLock > 536 Lock| transactionid > 552 Lock| extend > 716 LWLockTranche | buffer_content > 763 LWLockNamed | XidGenLock >2113 LWLockNamed | WALWriteLock >6190 LWLockTranche | wal_insert > 25002 Client | ClientRead > 78466 | > > tps = 27651.562835 (including connections establishing) > > Obviously, there's a vast increase in TPS, and the backends seem to > spend most of their time actually doing work. ClientRead is now the > overwhelmingly dominant wait event, although wal_insert and > WALWriteLock contention is clearly still a significant problem. > Contention on other locks is apparently quite rare. Notice that > client reads are really significant here - more than 20% of the time > we sample what a backend is doing, it's waiting for the next query. > It seems unlikely that performance on this workload can be improved > very much by optimizing anything other than WAL writing, because no > other wait event appears in more than 1% of the samples. It's not > clear how much of the WAL-related stuff is artificial lock contention > and how much of it is the limited speed at which the disk can rotate. > What happens if you turn fsync off? Once a xlog file is fully written, it is immediately fsynced, even if the backend is holding WALWriteLock or wal_insert (or both) at the time, and even if synchrounous_commit is off. Assuming this machine has a BBU so that it doesn't have to wait for disk rotation, still fsyncs are expensive because the kernel has to find all the data and get it sent over to the BBU, while holding locks. Second, ClientRead becomes a bigger and bigger issue as the number of > clients increases; by 192 clients, it appears in 45% of the samples. > That basically means that pgbench is increasingly unable to keep up > with the server; for whatever reason, it suffers more than the server > does from the increasing lack of CPU resources. I would be careful about that interpretation. If you asked pgbench, it would probably have the opposite opinion. The backend tosses its response at the kernel (which will never block, because the pgbench responses are all small and the kernel will buffer them) and then goes into ClientRead. After the backend goes into ClientRead, the kernel needs to find and wake up the pgbench, deliver the response, and pgbench has to receive and process the response. Only then does it create a new query (I've toyed before with having pgbench construct the next query while it is waiting for a response on the previous one, but that didn't seem promising, and much of pgbench has been rewritten since then), pass the query back to the kernel. Then the kernel has to find and wake up the backend and deliver the new query. So for a reasonable chunk of the time that the server thinks it is waiting for the client, the client also thinks it is waiting for the server. I think we need to come up with some benchmarking queries which get more work done per round-trip to the database. And build them into the binary, because otherwise people won't use them as much as they should if they have to pass "-f" files around mailing lists and blog postings. For example, we could enclose 5 statements of the TPC-B-like into a single function which takes aid, bid, tid, and delta as arguments. And presumably we could drop the other two statements (BEGIN and COMMIT) as well, and rely on autocommit to get that job done. So we could go from 7 statements to 1. > Third, > Lock/transactionid and Lock/tuple become more and more common as the > number of clients increases; these happen when two different pgbench > threads deciding to hit the same row at the same time. Due to the > birthday paradox this increases pretty quickly as the number of > clients ramps up, but it's not really a server issue: there's nothing > the server can do about the possibility that two or more clients pick > the same random number at the same time. > What I have done in the past is chop a zero off from: #define naccounts 10 and recompile pgbench. Then you can increase the scale factor so that you have less contention on pgbench_branches while still fitting the data in shared_buffers, or in RAM. Cheers, Jeff
Re: [HACKERS] Question about pg_control usage
Anastasia Lubennikovawrites: > What is the purpose of having CheckPoint stored in control file then? IIRC, we use some but not all of the fields. It seemed prudent (and simpler) to keep a copy of the whole thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Fri, Oct 7, 2016 at 11:34 AM, Tom Lanewrote: > Greg Stark writes: >> On Fri, Oct 7, 2016 at 3:06 PM, Tom Lane wrote: >>> In the same line, maybe we should kill libpq's support for V2 protocol >>> (which would make the cutoff 7.4). And maybe the server's support too, >>> though that wouldn't save very much code. The argument for cutting this >>> isn't so much that we would remove lots of code as that we're removing >>> code that never gets tested, at least not by us. > >> Somehow removing the whole protocol support seems a bit different to >> me than removing pg_dump logic. For one it's nice to be able to run a >> modern psql against old servers so you can run a benchmark script. > > Yeah, but surely pre-7.4 servers are no longer of much interest for that; > or if you want historical results you should also use a matching libpq. > >> For another there may be binary-only applications or drivers out there >> that are using V2 for whatever reason. > > The problem with letting it just sit there is that we're not, in fact, > testing it. If we take the above argument seriously then we should > provide some way to configure libpq to prefer V2 and run regression > tests in that mode. Otherwise, if/when we break it, we'll never know it > till we get field reports. I agree with that. I think it would be fine to keep V2 support if somebody wants to do the work to let us have adequate test coverage, but if nobody volunteers I think we might as well rip it out. I don't particularly enjoy committing things only to be told that they've broken something I can't test without unreasonable effort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Stephen Frostwrites: > Ugh. Thanks for fixing. I had tested back to 7.4 with the regression > tests but either those didn't include blobs or something got changed > after my testing and I didn't re-test all the way back when I should > have. It looks like the final state of the regression tests doesn't include any blobs before about 9.4. You wouldn't have seen any results worse than a warning message in 7.4-8.4, unless there were some blobs so that the data extraction loop got iterated. It might be a good idea to retroactively modify 9.1-9.3 so that there are some blobs in the final state, for purposes of testing pg_dump and pg_upgrade. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
Robert: On Fri, Oct 7, 2016 at 3:20 PM, Robert Haaswrote: > On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte > wrote: >> On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: >>> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera >>> wrote: >> ... I wonder if the real answer isn't just to disallow -f with parallel vacuuming. >>> Seems like we should figure out which catalog tables are needed in >>> order to perform a VACUUM, and force those to be done last and one at >>> a time. >> >> Is the system catalog a bottleneck for people who has real use for >> paralell vacuum? > I don't know, but it seems like the documentation for vacuumdb > currently says, more or less, "Hey, if you use -j with -f, it may not > work!", which seems unacceptable to me. It should be the job of the > person writing the feature to make it work in all cases, not the job > of the person using the feature to work around the problem when it > doesn't. That may be the case, but the only ways to solve it seems to be disallow full paralell as suggested. OTOH what I was asking was just if people think the time gained by minimizing the part of pg_catalog serially processed on a full-paralell case would be enough to warrant the increased code complexity and bug surface. Anyway, I'll stick to my original plan even if someone decides to fix or disallow full paralell as I think it has it uses. Francisco Olarte. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
Greg Starkwrites: > On Fri, Oct 7, 2016 at 3:06 PM, Tom Lane wrote: >> In the same line, maybe we should kill libpq's support for V2 protocol >> (which would make the cutoff 7.4). And maybe the server's support too, >> though that wouldn't save very much code. The argument for cutting this >> isn't so much that we would remove lots of code as that we're removing >> code that never gets tested, at least not by us. > Somehow removing the whole protocol support seems a bit different to > me than removing pg_dump logic. For one it's nice to be able to run a > modern psql against old servers so you can run a benchmark script. Yeah, but surely pre-7.4 servers are no longer of much interest for that; or if you want historical results you should also use a matching libpq. > For another there may be binary-only applications or drivers out there > that are using V2 for whatever reason. The problem with letting it just sit there is that we're not, in fact, testing it. If we take the above argument seriously then we should provide some way to configure libpq to prefer V2 and run regression tests in that mode. Otherwise, if/when we break it, we'll never know it till we get field reports. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about pg_control usage
Hi, hackers! I am examining various global variables in ShmemVariableCache, pg_control and pg_controldata. To force and debug xid wraparound, I've implemented a function, that allows to set nextXid value in control file manually, but it doesn't work as expected. After an investigation I found that, while loading global variables in StartupXlog() we read checkPoint location from ControlFile, and then read all the information from the checkPoint record. And never ever use nextXid stored in ControlFile. What is the purpose of having CheckPoint stored in control file then? I thought, that it allows us to start up server after correct shutdown without any xlog, as said in the comment in pg_control.h * Body of CheckPoint XLOG records. This is declared here because we keep * a copy of the latest one in pg_control for possible disaster recovery. * Changing this struct requires a PG_CONTROL_VERSION bump. But as far as I see, we never use this data. Could you please explain, why don't we use information from control file in StartapXlog()? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Amit Langotewrites: > > Just noticed that the getBlobs() query does not work for a 7.3 server > > (maybe <= 7.3) due to the following change in commit 23f34fa4 [1]: > > Ugh. > > > I could fix that using the attached patch. > > There's more wrong than that, as you'd notice if you tried dumping > a DB that actually had some LOs in it :-(. This obviously wasn't > tested on anything older than 9.0. > > Will push a fix in a bit, as soon as I can boot up my dinosaur with > a working 7.0 server to test that branch. Ugh. Thanks for fixing. I had tested back to 7.4 with the regression tests but either those didn't include blobs or something got changed after my testing and I didn't re-test all the way back when I should have. I wasn't able to (easily) get anything older than 7.4 to compile on my box, which is why I had stopped there. In any case, thanks again for the fix. Stephen signature.asc Description: Digital signature
Re: [HACKERS] Is it time to kill support for very old servers?
On Fri, Oct 7, 2016 at 3:06 PM, Tom Lanewrote: > pg_dump alleges support for dumping from servers back to 7.0. Would v10 > be a good time to remove some of that code? It's getting harder and > harder to even compile those ancient branches, let alone get people to > test against them (cf. 4806f26f9). My initial thought is to cut support > for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of > support for cases where the server lacks schemas or pg_depend, each of > which requires a fair deal of klugery in pg_dump. I might be expected to be the holdout here but it seems sensible to me. Removing code in pg_dump to deal with lacking schemas and pg_depend seems like a major simplification. > In the same line, maybe we should kill libpq's support for V2 protocol > (which would make the cutoff 7.4). And maybe the server's support too, > though that wouldn't save very much code. The argument for cutting this > isn't so much that we would remove lots of code as that we're removing > code that never gets tested, at least not by us. If it's testing we're concerned about IIRC the current servers can be arm-twisted into speaking V2 protocol. So it should be possible to test both modern servers and modern pg_dump using V2 protocol with a simple tweak. Somehow removing the whole protocol support seems a bit different to me than removing pg_dump logic. For one it's nice to be able to run a modern psql against old servers so you can run a benchmark script. For another there may be binary-only applications or drivers out there that are using V2 for whatever reason. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Fri, Oct 7, 2016 at 10:06 AM, Tom Lanewrote: > pg_dump alleges support for dumping from servers back to 7.0. Would v10 > be a good time to remove some of that code? It's getting harder and > harder to even compile those ancient branches, let alone get people to > test against them (cf. 4806f26f9). My initial thought is to cut support > for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of > support for cases where the server lacks schemas or pg_depend, each of > which requires a fair deal of klugery in pg_dump. It's been a long time since I've seen any 7.x versions in the wild, and the pg_dump stuff is a nuisance to maintain. +1 for dropping support for anything pre-7.4. Anybody who is upgrading from 7.3 to 10 isn't likely to want to do it in one swing anyway. In the real world, few upgrades span 14 major versions. > In the same line, maybe we should kill libpq's support for V2 protocol > (which would make the cutoff 7.4). And maybe the server's support too, > though that wouldn't save very much code. The argument for cutting this > isn't so much that we would remove lots of code as that we're removing > code that never gets tested, at least not by us. > > One small problem with cutting libpq's V2 support is that the server's > report_fork_failure_to_client() function still sends a V2-style message. > We could change that in HEAD, certainly, but we don't really want modern > libpq unable to parse such a message from an older server. Possibly we > could handle that specific case with a little special-purpose code and > still be able to take out most of fe-protocol2.c. I definitely think we can't remove client support for anything that modern servers still send, even if we change 10devel so that it no longer does so. I think we have to at least wait until all versions that might send a message are EOL before removing client-side support for it -- and maybe a bit longer than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM corruption leading to errors
06.10.2016 20:59, Pavan Deolasee: I investigated a bug report from one of our customers and it looked very similar to previous bug reports here [1], [2], [3] (and probably more). In these reports, the error looks something like this: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes I traced it to the following code in MarkBufferDirtyHint(). The function returns without setting the DIRTY bit on the standby: 3413 /* 3414 * If we're in recovery we cannot dirty a page because of a hint. 3415 * We can set the hint, just not dirty the page as a result so the 3416 * hint is lost when we evict the page or shutdown. 3417 * 3418 * See src/backend/storage/page/README for longer discussion. 3419 */ 3420 if (RecoveryInProgress()) 3421 return; 3422 freespace.c freely uses MarkBufferDirtyHint() whenever changes are made to the FSM. I think that's usually alright because FSM changes are not WAL logged and if FSM ever returns a block with less free space than the caller needs, the caller is usually prepared to update the FSM and request for a new block. But if it returns a block that is outside the size of the relation, then we've a trouble. The very next ReadBuffer() fails to handle such a block and throws the error. When a relation is truncated, the FSM is truncated too to remove references to the heap blocks that are being truncated. But since the FSM buffer may not be marked DIRTY on the standby, if the buffer gets evicted from the buffer cache, the on-disk copy of the FSM page may be left with references to the truncated heap pages. When the standby is later promoted to be the master, and an insert/update is attempted to the table, the FSM may return a block that is outside the valid range of the relation. That results in the said error. Once this was clear, it was easy to put together a fully reproducible test case. See the attached script; you'll need to adjust to your environment. This affects all releases starting 9.3 and the script can reproduce the problem on all these releases. I believe the fix is very simple. The FSM change during truncation is critical and the buffer must be marked by MarkBufferDirty() i.e. those changes must make to the disk. I think it's alright not to WAL log them because XLOG_SMGR_TRUNCATE will redo() them if a crash occurs. But it must not be lost across a checkpoint. Also, since it happens only during relation truncation, I don't see any problem from performance perspective. What bothers me is how to fix the problem for already affected standbys. If the FSM for some table is already corrupted at the standby, users won't notice it until the standby is promoted to be the new master. If the standby starts throwing errors suddenly after failover, it will be a very bad situation for the users, like we noticed with our customers. The fix is simple and users can just delete the FSM (and VACUUM the table), but that doesn't sound nice and they would not know until they see the problem. One idea is to always check if the block returned by the FSM is outside the range and discard such blocks after setting the FSM (attached patch does that). The problem with that approach is that RelationGetNumberOfBlocks() is not really cheap and invoking it everytime FSM is consulted may not be a bright idea. Can we cache that value in the RelationData or some such place (BulkInsertState?) and use that as a hint for quickly checking if the block is (potentially) outside the range and discard it? Any other ideas? The other concern I've and TBH that's what I initially thought as the real problem, until I saw RecoveryInProgress() specific code, is: can this also affect stand-alone masters? The comments at MarkBufferDirtyHint() made me think so: 3358 * 3. This function does not guarantee that the buffer is always marked dirty 3359 *(due to a race condition), so it cannot be used for important changes. So I was working with a theory that somehow updates to the FSM page are lost because the race mentioned in the comment actually kicks in. But I'm not sure if the race is only possible when the caller is holding a SHARE lock on the buffer. When the FSM is truncated, the caller holds an EXCLUSIVE lock on the FSM buffer. So probably we're safe. I could not reproduce the issue on a stand-alone master. But probably worth checking. It might also be a good idea to inspect other callers of MarkBufferDirtyHint() and see if any of them is vulnerable, especially from standby perspective. I did one round, and couldn't see another problem. Thanks, Pavan [1] https://www.postgresql.org/message-id/CAJakt-8%3DaXa-F7uFeLAeSYhQ4wFuaX3%2BytDuDj9c8Gx6S_ou%3Dw%40mail.gmail.com [2] https://www.postgresql.org/message-id/20160601134819.30392.85...@wrigleys.postgresql.org [3]
Re: [HACKERS] Is it time to kill support for very old servers?
On Fri, Oct 7, 2016 at 4:06 PM, Tom Lanewrote: > pg_dump alleges support for dumping from servers back to 7.0. Would v10 > be a good time to remove some of that code? It's getting harder and > harder to even compile those ancient branches, let alone get people to > test against them (cf. 4806f26f9). My initial thought is to cut support > for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of > support for cases where the server lacks schemas or pg_depend, each of > which requires a fair deal of klugery in pg_dump. > +1 for doing it, and also picking a version that's "easily explainable". If we can say "now we support back to 8.0" that's a lot better than saying 8.1 or 7.4. > In the same line, maybe we should kill libpq's support for V2 protocol > (which would make the cutoff 7.4). And maybe the server's support too, > though that wouldn't save very much code. The argument for cutting this > isn't so much that we would remove lots of code as that we're removing > code that never gets tested, at least not by us. > Which is a pretty strong argument in itself. > One small problem with cutting libpq's V2 support is that the server's > report_fork_failure_to_client() function still sends a V2-style message. > We could change that in HEAD, certainly, but we don't really want modern > libpq unable to parse such a message from an older server. Possibly we > could handle that specific case with a little special-purpose code and > still be able to take out most of fe-protocol2.c. > > Thoughts? > I agree we want the new libpq to be able to deal with that one from old versions, because 9.6 isn't really old yet. We *should* probably change it in head for the future anyway, but that means we have to keep it around in libpq for quite a long while anyway. Unless the special purpose code becomes long and complicated, I think that's the best way to do it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Robert Haaswrites: > On Fri, Oct 7, 2016 at 9:39 AM, Tom Lane wrote: >> There's more wrong than that, as you'd notice if you tried dumping >> a DB that actually had some LOs in it :-(. This obviously wasn't >> tested on anything older than 9.0. > Back in 2014, we talked about removing support for some older server versions: > https://www.postgresql.org/message-id/24529.1415921...@sss.pgh.pa.us > I think there have been other discussions, too, but I can't find them > at the moment. I just re-raised the subject: https://www.postgresql.org/message-id/2661.1475849...@sss.pgh.pa.us regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Fri, Oct 7, 2016 at 9:39 AM, Tom Lanewrote: > Amit Langote writes: >> Just noticed that the getBlobs() query does not work for a 7.3 server >> (maybe <= 7.3) due to the following change in commit 23f34fa4 [1]: > > Ugh. > >> I could fix that using the attached patch. > > There's more wrong than that, as you'd notice if you tried dumping > a DB that actually had some LOs in it :-(. This obviously wasn't > tested on anything older than 9.0. > > Will push a fix in a bit, as soon as I can boot up my dinosaur with > a working 7.0 server to test that branch. Back in 2014, we talked about removing support for some older server versions: https://www.postgresql.org/message-id/24529.1415921...@sss.pgh.pa.us I think there have been other discussions, too, but I can't find them at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is it time to kill support for very old servers?
pg_dump alleges support for dumping from servers back to 7.0. Would v10 be a good time to remove some of that code? It's getting harder and harder to even compile those ancient branches, let alone get people to test against them (cf. 4806f26f9). My initial thought is to cut support for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of support for cases where the server lacks schemas or pg_depend, each of which requires a fair deal of klugery in pg_dump. In the same line, maybe we should kill libpq's support for V2 protocol (which would make the cutoff 7.4). And maybe the server's support too, though that wouldn't save very much code. The argument for cutting this isn't so much that we would remove lots of code as that we're removing code that never gets tested, at least not by us. One small problem with cutting libpq's V2 support is that the server's report_fork_failure_to_client() function still sends a V2-style message. We could change that in HEAD, certainly, but we don't really want modern libpq unable to parse such a message from an older server. Possibly we could handle that specific case with a little special-purpose code and still be able to take out most of fe-protocol2.c. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switch to unnamed POSIX semaphores as our preferred sema code?
On Thu, Oct 6, 2016 at 5:07 PM, Tom Lanewrote: > Robert Haas writes: >> Alternatively, get a bigger box. :-) > > So what's it take to get access to hydra? Send me a private email with your .ssh key. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Amit Langotewrites: > Just noticed that the getBlobs() query does not work for a 7.3 server > (maybe <= 7.3) due to the following change in commit 23f34fa4 [1]: Ugh. > I could fix that using the attached patch. There's more wrong than that, as you'd notice if you tried dumping a DB that actually had some LOs in it :-(. This obviously wasn't tested on anything older than 9.0. Will push a fix in a bit, as soon as I can boot up my dinosaur with a working 7.0 server to test that branch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move allocation size overflow handling to MemoryContextAllocExtended()?
On Wed, Oct 5, 2016 at 2:09 PM, Andres Freundwrote: > On 2016-10-04 21:40:29 -0400, Tom Lane wrote: >> Andres Freund writes: >> > On 2016-10-05 09:38:15 +0900, Michael Paquier wrote: >> >> The existing interface of MemoryContextAlloc do not care much about >> >> anything except Size, so I'd just give the responsability to the >> >> caller to do checks like queue != (Size) queue when queue is a uint64 >> >> for example. >> >> > Well, that duplicates the check and error message everywhere. >> >> It seems like you're on the edge of reinventing calloc(), which is not an >> API that anybody especially likes. > > I'm not sure how doing an s/Size/uint64/ in a bunch of APIs does > that. Because that'd allow us to to throw an error in a number of useful > cases where we currently can't. > > I'm mostly concerned that there's a bunch of cases on 32bit platforms > where size_t is trivially overflowed. And being a bit more defensive > against that seems like a good idea. It took about a minute (10s of that > due to a typo) to find something that looks borked to me: > bool > spi_printtup(TupleTableSlot *slot, DestReceiver *self) > { > if (tuptable->free == 0) > { > /* Double the size of the pointer array */ > tuptable->free = tuptable->alloced; > tuptable->alloced += tuptable->free; > tuptable->vals = (HeapTuple *) repalloc_huge(tuptable->vals, > > tuptable->alloced * sizeof(HeapTuple)); > } > seems like it could overflow quite easily on a 32bit system. > > > People don't think about 32bit size_t a whole lot anymore, so I think by > defaulting to 64bit calculations for this kind of thing, we'll probably > prevent a number of future bugs. I think you're right, but I also think that if we start using uint64 rather than Size for byte-lengths, it will spread like kudzu through the whole system and we'll lose the documentation benefit of having sizes be called "Size". Since descriptive type names are a good thing, I don't like that very much. One crazy idea is to change Size to always be 64 bits and fix all the places where we translate between Size and size_t. But I'm not sure that's a good idea, either. This might be one of those cases where it's best to just accept that we're going to miss some things and fix them as we find them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olartewrote: > On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: >> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera >> wrote: > ... >>> I wonder if the real answer isn't just to disallow -f with parallel >>> vacuuming. >> Seems like we should figure out which catalog tables are needed in >> order to perform a VACUUM, and force those to be done last and one at >> a time. > > Is the system catalog a bottleneck for people who has real use for > paralell vacuum? I don't know, but it seems like the documentation for vacuumdb currently says, more or less, "Hey, if you use -j with -f, it may not work!", which seems unacceptable to me. It should be the job of the person writing the feature to make it work in all cases, not the job of the person using the feature to work around the problem when it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM's ancillary tasks
On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janeswrote: > In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were > represented, so that we could make freezing more aggressive without losing > forensic evidence. But I don't think we ever did anything to actually make > the freezing more aggressive. See 3cff1879f8d03cb729368722ca823a4bf74c0cac. The objection to doing it in other cases is that it adds write-ahead log volume which might cause its own share of problems. There might be some way of getting ahead of that, though. I think if we piggyback on an existing WAL record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be minimal, but I haven't been dedicated enough to try to write the patch. > When I applied the up-thread patch so that pgbench_history gets autovac, > those autovacs don't actually cause any pages to get frozen until the wrap > around kicks in, even when all the tuples on the early pages should be well > beyond vacuum_freeze_min_age. If the pages are already all-visible, they'll be skipped until vacuum_freeze_table_age is reached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench vs. wait events
Robert, This contention on WAL reminds me of another scenario I've heard about that was similar. To fix things what happened was that anyone that the first person to block would be responsible for writing out all buffers for anyone blocked behind "him". The for example if you have many threads, A, B, C, D If while A is writing to WAL and hold the lock, then B arrives, B of course blocks, then C comes along and blocks as well, then D. Finally A finishes its write and then Now you have two options for resolving this, either 1) A drops its lock, B picks up the lock... B writes its buffer and then drops the lock. Then C gets the lock, writes its buffer, drops the lock, then finally D gets the lock, writes its buffer and then drops the lock. 2) A then writes out B's, C's, and D's buffers, then A drops the lock, B, C and D wake up, note that their respective buffers are written and just return. This greatly speeds up the system. (just be careful to make sure A doesn't do "too much work" otherwise you can get a sort of livelock if too many threads are blocked behind it, generally only issue one additional flush on behalf of other threads, do not "loop until the queue is empty") I'm not sure if this is actually possible with the way WAL is implemented, (or perhaps if this strategy is already implemented) but it's definitely worth if not done already as it can speed things up enormously. On 10/6/16 11:38 AM, Robert Haas wrote: Hi, I decided to do some testing on hydra (IBM-provided community resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using the newly-enhanced wait event stuff to try to get an idea of what we're waiting for during pgbench. I did 30-minute pgbench runs with various configurations, but all had max_connections = 200, shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit = off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints = on. During each run, I ran this psql script in another window and captured the output: \t select wait_event_type, wait_event from pg_stat_activity where pid != pg_backend_pid() \watch 0.5 Then, I used a little shell-scripting to count up the number of times each wait event occurred in the output. First, I tried scale factor 3000 with 32 clients and got these results: 1 LWLockTranche | buffer_mapping 9 LWLockNamed | CLogControlLock 14 LWLockNamed | ProcArrayLock 16 Lock| tuple 25 LWLockNamed | CheckpointerCommLock 49 LWLockNamed | WALBufMappingLock 122 LWLockTranche | clog 182 Lock| transactionid 287 LWLockNamed | XidGenLock 1300 Client | ClientRead 1375 LWLockTranche | buffer_content 3990 Lock| extend 21014 LWLockNamed | WALWriteLock 28497 | 58279 LWLockTranche | wal_insert tps = 1150.803133 (including connections establishing) What jumps out here is, at least to me, is that there is furious contention on both the wal_insert locks and on WALWriteLock. Apparently, the system simply can't get WAL on disk fast enough to keep up with this workload. Relation extension locks and buffer_content locks also are also pretty common, both ahead of ClientRead, a relatively uncommon wait event on this test. The load average on the system was only about 3 during this test, indicating that most processes are in fact spending most of their time off-CPU. The first thing I tried was switching to unlogged tables, which produces these results: 1 BufferPin | BufferPin 1 LWLockTranche | lock_manager 2 LWLockTranche | buffer_mapping 8 LWLockNamed | ProcArrayLock 9 LWLockNamed | CheckpointerCommLock 9 LWLockNamed | CLogControlLock 11 LWLockTranche | buffer_content 37 LWLockTranche | clog 153 Lock| tuple 388 LWLockNamed | XidGenLock 827 Lock| transactionid 1267 Client | ClientRead 20631 Lock| extend 91767 | tps = 1223.239416 (including connections establishing) If you don't look at the TPS number, these results look like a vast improvement. The overall amount of time spent not waiting for anything is now much higher, and the problematic locks have largely disappeared from the picture. However, the load average now shoots up to about 30, because most of the time that the backends are "not waiting for anything" they are in fact in kernel wait state D; that is, they're stuck doing I/O. This suggests that we might want to consider advertising a wait state when a backend is doing I/O, so we can measure this sort of thing. Next, I tried lowering the scale factor to something that fits in shared buffers. Here are the results at scale factor 300: 14 Lock|
Re: [HACKERS] Complete LOCK TABLE ... IN ACCESS|ROW|SHARE
This was my first test which had help Gerdan. I did some tests and found nothing special. The stated resource is implemented correctly. He passes all regression tests and enables the use of the new features specified. The test was initially performed to verify that the features exist, however not effected, it follows the evidence: postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS postgres=# lock teste IN ACCESS postgres=# lock t TABLE teste postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock TABLE information_schema. pg_temp_1. pg_toast_temp_1. teste pg_catalog. pg_toast.public. postgres=# lock TABLE teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN SHARE postgres=# lock TABLE teste IN ROW postgres=# lock TABLE teste IN ROW postgres=# lock TABLE teste IN ROW - After applied patch come the new features. postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE SHARE ROW EXCLUSIVE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE SHARE UPDATE EXCLUSIVE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ROW EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN SHARE MODE ROW EXCLUSIVE MODE UPDATE EXCLUSIVE MODE postgres=# lock table teste IN SHARE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Complete LOCK TABLE ... IN ACCESS|ROW|SHARE
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested This was my first test which had help Gerdan. I did some tests and found nothing special. The stated resource is implemented correctly. He passes all regression tests and enables the use of the new features specified. The test was initially performed to verify that the features exist, however not effected, it follows the evidence: postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS postgres=# lock teste IN ACCESS postgres=# lock t TABLE teste postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock TABLE information_schema. pg_temp_1. pg_toast_temp_1. teste pg_catalog. pg_toast.public. postgres=# lock TABLE teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN ACCESS postgres=# lock TABLE teste IN SHARE postgres=# lock TABLE teste IN ROW postgres=# lock TABLE teste IN ROW postgres=# lock TABLE teste IN ROW - After applied patch come the new features. postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODEEXCLUSIVE MODE ROW SHARE MODE SHARE ROW EXCLUSIVE MODE ACCESS SHARE MODEROW EXCLUSIVE MODE SHARE MODE SHARE UPDATE EXCLUSIVE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ACCESS EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN ROW EXCLUSIVE MODE SHARE MODE postgres=# lock table teste IN SHARE MODE ROW EXCLUSIVE MODE UPDATE EXCLUSIVE MODE postgres=# lock table teste IN SHARE The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind and ssl test suites don't work if '.' is not in @INC
On Fri, Oct 7, 2016 at 8:54 PM, Heikki Linnakangaswrote: > I think we should fix this by the attached. Any better ideas? Updating PG_PROVE_FLAGS in src/Makefile.global.in, because any test suite having its own .pm file will fail. It is as well necessary to refresh $ENV{PERL5LIB} in vcregress.pl. > I'm a bit surprised no-one else has reported this yet. Have I missed a > report? Andres has reported that a couple of weeks back, on Debian testing like you: https://www.postgresql.org/message-id/20160908204529.flg6nivjuwp5v...@alap3.anarazel.de -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_rewind and ssl test suites don't work if '.' is not in @INC
On my system running Debian stretch (= testing): ~/git-sandbox-pgsql/96stable/src/bin/pg_rewind (REL9_6_STABLE)$ make check rm -rf '/home/heikki/git-sandbox-pgsql/96stable'/tmp_install /bin/mkdir -p '/home/heikki/git-sandbox-pgsql/96stable'/tmp_install/log make -C '../../..' DESTDIR='/home/heikki/git-sandbox-pgsql/96stable'/tmp_install install >'/home/heikki/git-sandbox-pgsql/96stable'/tmp_install/log/install.log 2>&1 rm -rf /home/heikki/git-sandbox-pgsql/96stable/src/bin/pg_rewind/tmp_check/log cd . && TESTDIR='/home/heikki/git-sandbox-pgsql/96stable/src/bin/pg_rewind' PATH="/home/heikki/git-sandbox-pgsql/96stable/tmp_install/home/heikki/pgsql.96stable/bin:$PATH" LD_LIBRARY_PATH="/home/heikki/git-sandbox-pgsql/96stable/tmp_install/home/heikki/pgsql.96stable/lib" PGPORT='65432' PG_REGRESS='/home/heikki/git-sandbox-pgsql/96stable/src/bin/pg_rewind/../../../src/test/regress/pg_regress' prove -I ../../../src/test/perl/ --verbose t/*.pl t/001_basic.pl 1..8 Can't locate RewindTest.pm in @INC (you may need to install the RewindTest module) (@INC contains: /home/heikki/git-sandbox-pgsql/96stable/src/bin/pg_rewind/../../../src/test/perl /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.24.1 /usr/local/share/perl/5.24.1 /usr/lib/x86_64-linux-gnu/perl5/5.24 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.24 /usr/share/perl/5.24 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at t/001_basic.pl line 6. BEGIN failed--compilation aborted at t/001_basic.pl line 6. That's not nice. Perl's @INC usually contains the current directory, and those test suites rely on that, but that changed recently on Debian stretch, and other distributions are likely to follow. There's more information on this at https://www.debian.org/security/2016/dsa-3628. I think we should fix this by the attached. Any better ideas? I'm a bit surprised no-one else has reported this yet. Have I missed a report? - Heikki >From d6f4eea7ee41a1d2ab0335bd126bf2cca7017cdd Mon Sep 17 00:00:00 2001 From: Heikki LinnakangasDate: Fri, 7 Oct 2016 14:40:43 +0300 Subject: [PATCH 1/1] Make TAP test suites to work, when @INC does not contain current dir. Perl and/or new Linux distributions are starting to remove "." from the @INC list by default. That breaks the pg_rewind and ssl test suites, which use helper perl modules that reside in the same directory. To fix, add the correct directory explicitly to prove's include dir. --- src/bin/pg_rewind/Makefile | 2 ++ src/test/ssl/Makefile | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/bin/pg_rewind/Makefile b/src/bin/pg_rewind/Makefile index d03a0a2..af4a800 100644 --- a/src/bin/pg_rewind/Makefile +++ b/src/bin/pg_rewind/Makefile @@ -18,6 +18,8 @@ include $(top_builddir)/src/Makefile.global PG_CPPFLAGS = -I$(libpq_srcdir) PG_LIBS = $(libpq_pgport) +PG_PROVE_FLAGS += -I$(top_srcdir)/$(subdir) + override CPPFLAGS := -I$(libpq_srcdir) -DFRONTEND $(CPPFLAGS) OBJS = pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \ diff --git a/src/test/ssl/Makefile b/src/test/ssl/Makefile index 2b04d82..c65791c 100644 --- a/src/test/ssl/Makefile +++ b/src/test/ssl/Makefile @@ -13,6 +13,8 @@ subdir = src/test/ssl top_builddir = ../../.. include $(top_builddir)/src/Makefile.global +PG_PROVE_FLAGS += -I$(top_srcdir)/$(subdir) + CERTIFICATES := server_ca server-cn-and-alt-names \ server-cn-only server-single-alt-name server-multiple-alt-names \ server-no-names server-revoked server-ss \ -- 2.9.3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/10/07 18:27, Ashutosh Bapat wrote: > It's allowed to specify an non-default opclass in partition by clause, > but I do not see any testcase testing the same. Can you please add > one. OK, I will add some tests related to that. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind: Should abort if both --source-pgdata and --source-server are specified
On 10/07/2016 01:34 PM, Michael Banck wrote: Hi, ISTM that pg_rewind's --source-pgdata and --source-server options are mutually exclusive, and the synopsis in the documentation seems to indicate that as well: |pg_rewind [option...] {-D | --target-pgdata} directory |{--source-pgdata=directory | --source-server=connstr} However, there is no such check in the code. I've seen people assume --source-pgdata is supposed to be the data directory location on the remote server if they specify --source-server as well, and are then confused by error messages. So I think pg_rewind should abort in this case. Patch for that attached. Agreed. Committed, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
On 10/07/2016 11:36 AM, Kyotaro HORIGUCHI wrote: The radix conversion function and map conversion script became more generic than the previous state. So I could easily added radix conversion of EUC_JP in addition to SjiftJIS. nm -S said that the size of radix tree data for sjis->utf8 conversion is 34kB and that for utf8->sjis is 46kB. (eucjp->utf8 57kB, utf8->eucjp 93kB) LUmapSJIS and ULmapSJIS was 62kB and 59kB, and LUmapEUC_JP and ULmapEUC_JP was 106kB and 105kB. If I'm not missing something, radix tree is faster and require less memory. Cool! Currently the tree structure is devided into several elements, One for 2-byte, other ones for 3-byte and 4-byte codes and output table. The other than the last one is logically and technically merged into single table but it makes the generator script far complex than the current complexity. I no longer want to play hide'n seek with complex perl object.. I think that's OK. There isn't really anything to gain by merging them. It might be better that combining this as a native feature of the core. Currently the helper function is in core but that function is given as conv_func on calling LocalToUtf. Yeah, I think we want to completely replace the current binary-search based code with this. I would rather maintain just one mechanism. Current implement uses *.map files of pg_utf_to_local as input. It seems not good but the radix tree files is completely uneditable. Provide custom made loading functions for every source instead of load_chartable() would be the way to go. # However, for example utf8_to_sjis.map, it doesn't seem to have # generated from the source mentioned in UCS_to_SJIS.pl Ouch. We should find and document an authoritative source for all the mappings we have... I think the next steps here are: 1. Find an authoritative source for all the existing mappings. 2. Generate the radix tree files directly from the authoritative sources, instead of the existing *.map files. 3. Completely replace the existing binary-search code with this. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_rewind: Should abort if both --source-pgdata and --source-server are specified
Hi, ISTM that pg_rewind's --source-pgdata and --source-server options are mutually exclusive, and the synopsis in the documentation seems to indicate that as well: |pg_rewind [option...] {-D | --target-pgdata} directory |{--source-pgdata=directory | --source-server=connstr} However, there is no such check in the code. I've seen people assume --source-pgdata is supposed to be the data directory location on the remote server if they specify --source-server as well, and are then confused by error messages. So I think pg_rewind should abort in this case. Patch for that attached. Thoughts? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index 5fdd4c5..dd62dd0 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -162,6 +162,13 @@ main(int argc, char **argv) exit(1); } + if (datadir_source != NULL && connstr_source != NULL) + { + fprintf(stderr, _("%s: only one of --source-pgdata or --source-server can be specified\n"), progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); + exit(1); + } + if (datadir_target == NULL) { fprintf(stderr, _("%s: no target data directory specified (--target-pgdata)\n"), progname); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat_statements and non default search_path
Hello, I've faced multiple times a painful limitation with pg_stat_statements. Queries are normalized based on the textual SQL statements, and the queryid is computed using objects' oids. So for instance with different search_path settings, we can end up with the same normalized query but different queryids, and there's no way to know what are the actual relations each query is using. It also means that it can be almost impossible to use the normalized query to replay a query. I see two ways of fixing this. First one would be to store normalized queries while fully qualifiying the relation's names. After a quick look this can't be done without storing at least a token location in RangeTblEntry nodes, which sounds like a bad idea. The other way would be to store the value of search_path with each pgss entry (either in shared_memory or in the external file). Is it something that we should fix, and if yes is any of this acceptable, or does someone see another way to solve this problem? Regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/05/2016 10:03 AM, Amit Kapila wrote: On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondrawrote: Hi, After collecting a lot more results from multiple kernel versions, I can confirm that I see a significant improvement with 128 and 192 clients, roughly by 30%: 64128192 master 62482 43181 50985 granular-locking 61701 59611 47483 no-content-lock62650 59819 47895 group-update 63702 64758 62596 But I only see this with Dilip's workload, and only with pre-4.3.0 kernels (the results above are from kernel 3.19). That appears positive. I got access to a large machine with 72/144 cores (thanks to Oleg and Alexander from Postgres Professional), and I'm running the tests on that machine too. Results from Dilip's workload (with scale 300, unlogged tables) look like this: 32 64128 192224 256288 master104943 128579 72167 100967 66631 97088 63767 granular-locking 103415 141689 83780 120480 71847 115201 67240 group-update 105343 144322 92229 130149 81247 126629 76638 no-content-lock 103153 140568 80101 119185 70004 115386 66199 So there's some 20-30% improvement for >= 128 clients. But what I find much more intriguing is the zig-zag behavior. I mean, 64 clients give ~130k tps, 128 clients only give ~70k but 192 clients jump up to >100k tps again, etc. FWIW I don't see any such behavior on pgbench, and all those tests were done on the same cluster. With 4.5.5, results for the same benchmark look like this: 64128192 master 35693 39822 42151 granular-locking 35370 39409 41353 no-content-lock36201 39848 42407 group-update 35697 39893 42667 That seems like a fairly bad regression in kernel, although I have not identified the feature/commit causing it (and it's also possible the issue lies somewhere else, of course). With regular pgbench, I see no improvement on any kernel version. For example on 3.19 the results look like this: 64128192 master 54661 61014 59484 granular-locking 55904 62481 60711 no-content-lock56182 62442 61234 group-update 55019 61587 60485 Are the above results with synchronous_commit=off? No, but I can do that. I haven't done much more testing (e.g. with -N to eliminate collisions on branches) yet, let's see if it changes anything. Yeah, let us see how it behaves with -N. Also, I think we could try at higher scale factor? Yes, I plan to do that. In total, I plan to test combinations of: (a) Dilip's workload and pgbench (regular and -N) (b) logged and unlogged tables (c) scale 300 and scale 3000 (both fits into RAM) (d) sync_commit=on/off regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
It's allowed to specify an non-default opclass in partition by clause, but I do not see any testcase testing the same. Can you please add one. On Fri, Oct 7, 2016 at 2:50 PM, Amit Langotewrote: > On 2016/10/07 17:33, Rajkumar Raghuwanshi wrote: >> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote wrote: >> I have applied latest patches, getting some error and crash, please check >> if you are also able to reproduce the same. >> >> Observation1 : Not able to create index on partition table. >> -- >> CREATE TABLE rp (c1 int, c2 int) PARTITION BY RANGE(c1); >> CREATE TABLE rp_p1 PARTITION OF rp FOR VALUES START (1) END (10); >> CREATE TABLE rp_p2 PARTITION OF rp FOR VALUES START (10) END (20); >> >> CREATE INDEX idx_rp_c1 on rp(c1); >> ERROR: cannot create index on partitioned table "rp" > > This one is a new behavior as I mentioned earlier in my previous email. > >> Observation2 : Getting cache lookup failed error for multiple column range >> partition >> -- >> CREATE TABLE rp1_m (c1 int, c2 int) PARTITION BY RANGE(c1, ((c1 + c2)/2)); >> >> CREATE TABLE rp1_m_p1 PARTITION OF rp1_m FOR VALUES START (1, 1) END (10, >> 10); >> ERROR: cache lookup failed for attribute 0 of relation 16429 >> >> Observation3 : Getting server crash with multiple column range partition >> -- >> CREATE TABLE rp2_m (c1 int, c2 int) PARTITION BY RANGE(((c2 + c1)/2), c2); >> CREATE TABLE rp2_m_p1 PARTITION OF rp2_m FOR VALUES START (1, 1) END (10, >> 10); >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> !> > > Fixed. Should have been caught when running the regression tests after I > made changes to 0001 to address some review comments (apparently there was > no test in 0003 that would've caught this, so added a new one, thanks!). > > > Attached updated patches. > > Thanks, > Amit -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On 10/06/2016 07:36 AM, Pavan Deolasee wrote: On Wed, Oct 5, 2016 at 1:43 PM, Tomas Vondra> wrote: ... I can confirm the significant speedup, often by more than 75% (depending on number of indexes, whether the data set fits into RAM, etc.). Similarly for the amount of WAL generated, although that's a bit more difficult to evaluate due to full_page_writes. I'm not going to send detailed results, as that probably does not make much sense at this stage of the development - I can repeat the tests once the open questions get resolved. Sure. Anything that stands out? Any regression that you see? I'm not sure if your benchmarks exercise the paths which might show overheads without any tangible benefits. For example, I wonder if a test with many indexes where most of them get updated and then querying the table via those updated indexes could be one such test case. No, nothing that would stand out. Let me explain what benchmark(s) I've done. I've made some minor mistakes when running the benchmarks, so I plan to rerun them and post the results after that. So let's take the data with a grain of salt. My goal was to compare current non-HOT behavior (updating all indexes) with the WARM (updating only indexes on modified columns), and I've taken two approaches: 1) fixed number of indexes, update variable number of columns Create a table with 8 secondary indexes and then run a bunch of benchmarks updating increasing number of columns. So the first run did UPDATE t SET c1 = c1+1 WHERE id = :id; while the second did UPDATE t SET c1 = c1+1, c2 = c2+1 WHERE id = :id; and so on, up to updating all the columns in the last run. I've used multiple scripts to update all the columns / indexes uniformly (essentially using multiple "-f" flags with pgbench). The runs were fairly long (2h, enough to get stable behavior). For a small data set (fits into RAM), the results look like this: master patched diff 15994 8490 +42% 24347 7903 +81% 34340 7400 +70% 44324 6929 +60% 54256 6495 +52% 64253 5059 +19% 74235 4534 +7% 84194 4237 +1% and the amount of WAL generated (after correction for tps difference) looks like this (numbers are MBs) master patcheddiff 1 27257 18508-32% 2 21753 14599-33% 3 21912 15864-28% 4 22021 17135-22% 5 21819 18258-16% 6 21929 20659 -6% 7 21994 22234 +1% 8 21851 23267 +6% So this is quite significant difference. I'm pretty sure the minor WAL increase for the last two runs is due to full page writes (which also affects the preceding runs, making the WAL reduction smaller than the tps increase). I do have results for larger data sets (>RAM), the results are very similar although the speedup seems a bit smaller. But I need to rerun those. 2) single-row update, adding indexes between runs This is kinda the opposite of the previous approach, i.e. transactions always update a single column (multiple scripts to update the columns uniformly), but there are new indexes added between runs. The results (for a large data set, exceeding RAM) look like this: master patcheddiff 0 954 1404 +47% 1 701 1045 +49% 2 484 816 +70% 3 346 683 +97% 4 248 608 +145% 5 190 525 +176% 6 152 397 +161% 7 123 315 +156% 8 123 270 +119% So this looks really interesting. There's a lot of useful and important feedback in the thread(s) so far, particularly the descriptions of various failure cases. I think it'd be very useful to collect those examples and turn them into regression tests - that's something the patch should include anyway. Sure. I added only a handful test cases which I knew regression isn't covering. But I'll write more of them. One good thing is that the code gets heavily exercised even during regression. I caught and fixed multiple bugs running regression. I'm not saying that's enough, but it certainly gives some confidence. I don't see any changes to src/test in the patch, so I'm not sure what you mean when you say you added a handful of test cases? and update: update t set a = a+1, b=b+1; which has to update all indexes on the table, but: select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables n_tup_upd | n_tup_hot_upd ---+--- 1000 | 1000 So it's still counted as "WARM" - does it make sense? No, it does not. The code currently just marks any update as a WARM update if the
Re: [HACKERS] VACUUM's ancillary tasks
On Fri, Oct 7, 2016 at 6:46 AM, Robert Haaswrote: > On Thu, Oct 6, 2016 at 3:56 PM, Jeff Janes wrote: >>> Sure, I could handle each case separately, but the goal of this patch >>> (as hinted at in the Subject) is to generalize all the different tasks >>> we've been giving to VACUUM. The only missing piece is what the first >>> patch addresses; which is insert-mostly tables never getting vacuumed. >> >> I don't buy the argument that we should do this in order to be "general". >> Those other pieces are present to achieve a specific job, not out of >> generality. > > +1. > >> If we want to have something to vacuum insert-mostly tables for the sake of >> the index-only-scans, then I don't think we can ignore the fact that the >> visibility map is page based, not tuple based. If we insert 10,000 tuples >> into a full table and they all go into 100 pages at the end, that is very >> different than inserting 10,000 tuples which each go into a separate page >> (because each page has that amount of freespace). In one case you have >> 10,000 tuples not marked as all visible, in the other case you have >> 1,000,000 not marked as all visible. > > +1. > >> Also, I think that doing more counts which get amalgamated into the same >> threshold, rather than introducing another parameter, is a bad thing. I >> have insert-mostly, most of the time, tables which are never going to >> benefit from index-only-scans, and I don't want to pay the cost of them >> getting vacuumed just because of some inserts, when I will never get a >> benefit out of it. I could turn autovacuum off for those tables, but then I >> would have to remember to manually intervene on the rare occasion they do >> get updates or deletes. I want to be able to manually pick which tables I >> sign up for this feature (and then forget it), not manually pick which ones >> to exempt from it and have to constantly review that. > > Of course, if you do that, then what will happen is eventually it will > be time to advance relfrozenxid for that relation, and you'll get a > giant soul-crushing VACUUM of doom, rather than a bunch of small, > hopefully-innocuous VACUUMs. I've been wondering if would make sense > to trigger vacuums based on inserts based on a *fixed* number of > pages, rather than a percentage of the table. Say, for example, > whenever we have 64MB of not-all-visible pages, we VACUUM. +1 > > There are some difficulties: > > 1. We don't want to vacuum too early. For example, a bulk load may > vastly inflate the size of a table, but vacuuming it while the load is > in progress will be useless. You can maybe avoid that problem by > basing this on statistics only reported at the end of the transaction, > but there's another, closely-related issue: vacuuming right after the > transaction completes may be useless, too, if there are old snapshots > still in existence that render the newly-inserted tuples > not-all-visible. If the dummy xid can be generated for vacuum before starting vacuum for maintenance vm which is triggered by the amount of the cleared vm page, that vacuum could wait for old transaction finishes. > 2. We don't want to trigger index vacuuming for a handful of dead > tuples, or at least not too often. I've previously suggested > requiring a certain minimum number of dead index tuples that would be > required before index vacuuming occurs; prior to that, we'd just > truncate to dead line pointers and leave those for the next vacuum > cycle. This might need some refinement - should it be page-based? - > but the basic idea still seems sound. > Where do we leave dead line pointers at? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langotewrote: > > Attached revised patches. Also, includes a fix for an issue reported by > Rajkumar Raghuwanshi [1] which turned out to be a bug in one of the later > patches. I will now move on to addressing the comments on patch 0003. > > Thanks a lot for the review! > > Thanks, > Amit > > [1] > https://www.postgresql.org/message-id/5dded2f1-c7f6-e7fc-56b > 5-23ab59495...@lab.ntt.co.jp Hi, I have applied latest patches, getting some error and crash, please check if you are also able to reproduce the same. Observation1 : Not able to create index on partition table. -- CREATE TABLE rp (c1 int, c2 int) PARTITION BY RANGE(c1); CREATE TABLE rp_p1 PARTITION OF rp FOR VALUES START (1) END (10); CREATE TABLE rp_p2 PARTITION OF rp FOR VALUES START (10) END (20); CREATE INDEX idx_rp_c1 on rp(c1); ERROR: cannot create index on partitioned table "rp" Observation2 : Getting cache lookup failed error for multiple column range partition -- CREATE TABLE rp1_m (c1 int, c2 int) PARTITION BY RANGE(c1, ((c1 + c2)/2)); CREATE TABLE rp1_m_p1 PARTITION OF rp1_m FOR VALUES START (1, 1) END (10, 10); ERROR: cache lookup failed for attribute 0 of relation 16429 Observation3 : Getting server crash with multiple column range partition -- CREATE TABLE rp2_m (c1 int, c2 int) PARTITION BY RANGE(((c2 + c1)/2), c2); CREATE TABLE rp2_m_p1 PARTITION OF rp2_m FOR VALUES START (1, 1) END (10, 10); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] VACUUM's ancillary tasks
On Fri, Oct 7, 2016 at 9:40 AM, Jeff Janeswrote: > On Thu, Oct 6, 2016 at 2:46 PM, Robert Haas wrote: >> >> >> > Also, I think that doing more counts which get amalgamated into the same >> > threshold, rather than introducing another parameter, is a bad thing. I >> > have insert-mostly, most of the time, tables which are never going to >> > benefit from index-only-scans, and I don't want to pay the cost of them >> > getting vacuumed just because of some iOn Thu, Oct 6, 2016 at 3:56 PM, >> > Jeff Janes wrote: >> >> Sure, I could handle each case separately, but the goal of this patch >> >> nserts, when I will never get a >> > benefit out of it. I could turn autovacuum off for those tables, but >> > then I >> > would have to remember to manually intervene on the rare occasion they >> > do >> > get updates or deletes. I want to be able to manually pick which tables >> > I >> > sign up for this feature (and then forget it), not manually pick which >> > ones >> > to exempt from it and have to constantly review that. >> >> Of course, if you do that, then what will happen is eventually it will >> be time to advance relfrozenxid for that relation, and you'll get a >> giant soul-crushing VACUUM of doom, rather than a bunch of small, >> hopefully-innocuous VACUUMs. > > > I think I will get the soul-crushing vacuum of doom anyway, if the database > lasts that long. For one reason, in my case while updates and deletes are > rare, they are common enough that the frozen vm bits from early vacuums will > be mostly cleared before the vacuum of doom comes around. > > For a second reason, I don't think the frozen bit in the vm actually gets > set by much of anything other than a autovacuum-for-wraparound or a manual > VACUUM FREEZE. Yeah, the freeze map would be effective especially for static table. Since we can skip to vacuum frozen page and the freezing tuples is not big pain usually, we might want to change autovacuum more aggressive to freeze the page by reducing default value of vacuum_freeze_min_age. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Thu, Oct 6, 2016 at 2:52 PM, Amit Langotewrote: > On 2016/10/06 17:45, Ashutosh Bapat wrote: >> On Thu, Oct 6, 2016 at 1:34 PM, Masahiko Sawada >> wrote: >>> On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat >>> wrote: > My understanding is that basically the local server can not return > COMMIT to the client until 2nd phase is completed. If we do that, the local server may not return to the client at all, if the foreign server crashes and never comes up. Practically, it may take much longer to finish a COMMIT, depending upon how long it takes for the foreign server to reply to a COMMIT message. >>> >>> Yes, I think 2PC behaves so, please refer to [1]. >>> To prevent local server stops forever due to communication failure., >>> we could provide the timeout on coordinator side or on participant >>> side. >> >> This too, looks like a heuristic and shouldn't be the default >> behaviour and hence not part of the first version of this feature. > > At any rate, the coordinator should not return to the client until after > the 2nd phase is completed, which was the original point. If COMMIT > taking longer is an issue, then it could be handled with one of the > approaches mentioned so far (even if not in the first version), but no > version of this feature should really return COMMIT to the client only > after finishing the first phase. Am I missing something? There is small time window between actual COMMIT and a commit message returned. An actual commit happens when we insert a WAL saying transaction X committed and then we return to the client saying a COMMIT happened. Note that a transaction may be committed but we will never return to the client with a commit message, because connection was lost or the server crashed. I hope we agree on this. COMMITTING the foreign prepared transactions happens after we COMMIT the local transaction. If we do it before COMMITTING local transaction and the local server crashes, we will roll back local transaction during subsequence recovery while the foreign segments have committed resulting in an inconsistent state. If we are successful in COMMITTING foreign transactions during post-commit phase, COMMIT message will be returned after we have committed all foreign transactions. But in case we can not reach a foreign server, and request times out, we can not revert back our decision that we are going to commit the transaction. That's my answer to the timeout based heuristic. I don't see much point in holding up post-commit processing for a non-responsive foreign server, which may not respond for days together. Can you please elaborate a use case? Which commercial transaction manager does that? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers