Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I still haven't categorically ruled out pl/sh yet; that's something to keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the problem went away. So again, it looks like this might be a lot of false alarm. A pl/sh driven failure might still be interesting if it's coming from the internal calls it's making, so I'm still chasing it down. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote: The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. Can't the planner take effective_io_concurrency into account? Thom
Re: [HACKERS] Safe memory allocation functions
On Tue, Jan 27, 2015 at 5:34 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-27 17:27:53 +0900, Michael Paquier wrote: Alvaro Herrera wrote: So how about something like #define ALLOCFLAG_HUGE 0x01 #define ALLOCFLAG_NO_ERROR_ON_OOM 0x02 void * MemoryContextAllocFlags(MemoryContext context, Size size, int flags); The flag for huge allocations may be useful, but I don't actually see much value in the flag ALLOC_NO_OOM if the stuff in aset.c returns unconditionally NULL in case of an OOM and we let palloc complain about an OOM when allocation returns NULL. Something I am missing perhaps? I guess the idea is to have *user facing* MemoryContextAllocExtended() that can do both huge and no-oom allocations. Otherwise we need palloc like wrappers for all combinations. We're certainly not just going to ignore memory allocation failures generally in in MemoryContextAllocExtended() As a result of all the comments on this thread, here are 3 patches implementing incrementally the different ideas from everybody: 1) 0001 modifies aset.c to return unconditionally NULL in case of an OOM instead of reporting an error. All the OOM error reports are moved to mcxt.c (MemoryContextAlloc* and palloc*) 2) 0002 adds the noerror routines for frontend and backend. 3) 0003 adds MemoryContextAllocExtended that can be called with the following control flags: #define ALLOC_HUGE 0x01/* huge allocation */ #define ALLOC_ZERO 0x02/* clear allocated memory */ #define ALLOC_NO_OOM 0x04/* no failure if out-of-memory */ #define ALLOC_ALIGNED 0x08/* request length suitable for MemSetLoop */ This groups MemoryContextAlloc, MemoryContextAllocHuge, MemoryContextAllocZero and MemoryContextAllocZeroAligned under the same central routine. Regards, -- Michael From 337c439554ce66486cf9d29dced6c72d034b8f8d Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Wed, 28 Jan 2015 22:10:13 +0900 Subject: [PATCH 1/3] Make allocation return functions return NULL on OOM On counterpart, higher-level APIs in mcxt.c return an explicit error message when memory requests cannot be completed. --- src/backend/utils/mmgr/aset.c | 30 --- src/backend/utils/mmgr/mcxt.c | 48 +++ 2 files changed, 61 insertions(+), 17 deletions(-) diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 85b3c9a..bf6f09a 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -642,8 +642,8 @@ AllocSetDelete(MemoryContext context) /* * AllocSetAlloc - * Returns pointer to allocated memory of given size; memory is added - * to the set. + * Returns pointer to allocated memory of given size or NULL if + * request could not be completed; memory is added to the set. * * No request may exceed: * MAXALIGN_DOWN(SIZE_MAX) - ALLOC_BLOCKHDRSZ - ALLOC_CHUNKHDRSZ @@ -673,10 +673,7 @@ AllocSetAlloc(MemoryContext context, Size size) if (block == NULL) { MemoryContextStats(TopMemoryContext); - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg(out of memory), - errdetail(Failed on request of size %zu., size))); + return NULL; } block-aset = set; block-freeptr = block-endptr = ((char *) block) + blksize; @@ -867,10 +864,7 @@ AllocSetAlloc(MemoryContext context, Size size) if (block == NULL) { MemoryContextStats(TopMemoryContext); - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg(out of memory), - errdetail(Failed on request of size %zu., size))); + return NULL; } block-aset = set; @@ -1002,9 +996,10 @@ AllocSetFree(MemoryContext context, void *pointer) /* * AllocSetRealloc - * Returns new pointer to allocated memory of given size; this memory - * is added to the set. Memory associated with given pointer is copied - * into the new memory, and the old memory is freed. + * Returns new pointer to allocated memory of given size or NULL if + * request could not be completed; this memory is added to the set. + * Memory associated with given pointer is copied into the new memory, + * and the old memory is freed. * * Without MEMORY_CONTEXT_CHECKING, we don't know the old request size. This * makes our Valgrind client requests less-precise, hazarding false negatives. @@ -1109,10 +1104,7 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) if (block == NULL) { MemoryContextStats(TopMemoryContext); - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg(out of memory), - errdetail(Failed on request of size %zu., size))); + return NULL; } block-freeptr = block-endptr = ((char *) block) + blksize; @@ -1179,6 +1171,10 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) /* allocate new chunk */ newPointer = AllocSetAlloc((MemoryContext) set, size); + /* leave immediately if request
Re: [HACKERS] TABLESAMPLE patch
On 28/01/15 09:41, Kyotaro HORIGUCHI wrote: As an issue related to size esmation, I got a explain result as following, =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a 5; QUERY PLAN Sample Scan on t1 (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2 .920 rows=4294 loops=1) Filter: (a 5) Rows Removed by Filter: 5876 Buffers: shared hit=45 actual rows is large as twice as the estimated. tsm_system_cost estimates the number of the result rows using baserel-tuples, not using baserel-rows so it doesn't reflect the scan quals. Did the code come from some other intention? No, that's a bug. 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. I think adding additional abstraction would simplify the function and reduce the chance of discrepency, I think we need to almost create a duplicate version of code for heapgetpage() method. Yeah, I agree that we need to build structure like ScanDesc, but still it will be worth to keep code consistent. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. As a general opinion, I'll vote for Amit's comment, since three or four similar instances seems to be a enough reason to abstract it. On the other hand, since the scan processes are distributed in ExecProcNode by the nodeTag of scan nodes, reunioning of the control by abstraction layer after that could be said to introducing useless annoyance. In short, bastraction at the level of *Next() would bring the necessity of additional changes around the execution node distribution. Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. How will it get smoothen for cases when let us say more than 50% of tuples are not visible. I think its due to Postgres non-overwritting storage architecture that we maintain multiple version of rows in same heap, otherwise for different kind of architecture (like mysql/oracle) where multiple row versions are not maintained in same heap, the same function (with same percentage) can return entirely different number of rows. I don't know how else to explain, if we loop over every tuple in the relation and return it with equal probability then visibility checks don't matter as the percentage of visible tuples will be same in the result as in the relation. Surely it migh yield the effectively the same result. Even so, this code needs exaplation comment about the nature aroud the code, or write code as MMVC people won't get confused, I suppose. Yes it does, but as I am failing to explain even here, it's not clear to me what to write there. From my point of view it's just effect of the essential property of bernoulli sampling which is that every element has equal probability of being included in the sample. It comes from the fact that we do bernoulli trial (in the code it's the while (sampler_random_fract(sampler-randstate) probability) loop) on every individual tuple. This means that the ratio of visible and invisible tuples should be same in the sample as it is in the relation. We then just skip the invisible tuples and get the correct percentage of the visible ones (this has performance benefit of not having to do visibility check on all tuples). If that wasn't true than the bernoulli sampling would just simply not work as intended as the same property is the reason why it's used in statistics - the trends should look the same in sample as they look in the source population. Obviously there is some variation in practice as we don't have perfect random generator but that's independent of the algorithm. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7
Re: [HACKERS] pg_upgrade and rsync
* Jim Nasby (jim.na...@bluetreble.com) wrote: On 1/27/15 9:29 AM, Stephen Frost wrote: My point is that Bruce's patch suggests looking for remote_dir in the rsync documentation, but no such term appears there. Ah, well, perhaps we could simply add a bit of clarification to this: for details on specifying optionremote_dir/ The whole remote_dir discussion made me think of something... would --link-dest be any help here? No. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 2:08 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: OTOH, spreading the I/O across multiple files is not a good thing, if you don't have a RAID setup like that. With a single spindle, you'll just induce more seeks. Perhaps the OS is smart enough to read in large-enough chunks that the occasional seek doesn't hurt much. But then again, why isn't the OS smart enough to read in large-enough chunks to take advantage of the RAID even when you read just a single file? Suppose we have N spindles and N worker processes and it just so happens that the amount of computation is such that a each spindle can keep one CPU busy. Let's suppose the chunk size is 4MB. If you read from the relation at N staggered offsets, you might be lucky enough that each one of them keeps a spindle busy, and you might be lucky enough to have that stay true as the scans advance. You don't need any particularly large amount of read-ahead; you just need to stay at least one block ahead of the CPU. But if you read the relation in one pass from beginning to end, you need at least N*4MB of read-ahead to have data in cache for all N spindles, and the read-ahead will certainly fail you at the end of every 1GB segment. The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
Bruce, * Bruce Momjian (br...@momjian.us) wrote: On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote: The example listed works, but only when it's a local rsync: rsync --archive --hard-links --size-only old_dir new_dir remote_dir Perhaps a better example (or additional one) would be with a remote rsync, including clarification of old and new dir, like so: (run in /var/lib/postgresql) rsync --archive --hard-links --size-only \ 9.3/main \ 9.4/main \ server:/var/lib/postgresql/ Note that 9.3/main and 9.4/main are two source directories for rsync to copy over, while server:/var/lib/postgresql/ is a remote destination directory. The above directories match a default Debian/Ubuntu install. OK, sorry everyone was confused by 'remote_dir'. Does this new patch help? Looks better, but --links is not the same as --hard-links. The example is right, the but documentation below it mentions option--link/ which is for symlinks, not hard links. This also should really include a discussion about dealing with tablespaces, since the example command won't deal with them. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade and rsync
* Bruce Momjian (br...@momjian.us) wrote: Interesting problem, but doesn't rsync use sub-second accuracy? No. Simple test will show: touch xx/aa ; rsync -avv xx yy ; sleep 0.5 ; touch xx/aa ; rsync -avv xx yy Run that a few times and you'll see it report xx/aa is uptodate sometimes, depending on when exactly where the sleep falls during the second. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] ya comment typo
'the the' in bufmgr.c--- src/backend/storage/buffer/bufmgr.c.orig 2015-01-28 09:13:39.681366670 +0100 +++ src/backend/storage/buffer/bufmgr.c 2015-01-28 09:14:14.225690845 +0100 @@ -138,7 +138,7 @@ static void ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref); /* - * Ensure that the the PrivateRefCountArray has sufficient space to store one + * Ensure that the PrivateRefCountArray has sufficient space to store one * more entry. This has to be called before using NewPrivateRefCountEntry() to * fill a new entry - but it's perfectly fine to not use a reserved entry. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ya comment typo
On 01/28/2015 10:16 AM, Erik Rijkers wrote: 'the the' in bufmgr.c Fixed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
Hello, On 19/01/15 07:08, Amit Kapila wrote: On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: No issues, but it seems we should check other paths where different handling could be required for tablesample scan. In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size()) for rel size estimates where it checks the presence of partial indexes and that might effect the size estimates and that doesn't seem to be required when we have to perform scan based on TableSample method. I think that's actually good to have, because we still do costing and the partial index might help produce better estimate of number of returned rows for tablesample as well. As an issue related to size esmation, I got a explain result as following, =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a 5; QUERY PLAN Sample Scan on t1 (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2 .920 rows=4294 loops=1) Filter: (a 5) Rows Removed by Filter: 5876 Buffers: shared hit=45 actual rows is large as twice as the estimated. tsm_system_cost estimates the number of the result rows using baserel-tuples, not using baserel-rows so it doesn't reflect the scan quals. Did the code come from some other intention? 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. I think adding additional abstraction would simplify the function and reduce the chance of discrepency, I think we need to almost create a duplicate version of code for heapgetpage() method. Yeah, I agree that we need to build structure like ScanDesc, but still it will be worth to keep code consistent. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. As a general opinion, I'll vote for Amit's comment, since three or four similar instances seems to be a enough reason to abstract it. On the other hand, since the scan processes are distributed in ExecProcNode by the nodeTag of scan nodes, reunioning of the control by abstraction layer after that could be said to introducing useless annoyance. In short, bastraction at the level of *Next() would bring the necessity of additional changes around the execution node distribution. Another one is PageIsAllVisible() check. Done. Another thing is don't we want to do anything for sync scans for these method's as they are doing heap scan? I don't follow this one tbh. Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c. Basically during sequiantial scan on same table by different backends, we attempt to keep them synchronized to reduce the I/O. Ah this, yes, it makes sense for bernoulli, not for system though. I guess it should be used for sampling methods that use SAS_SEQUENTIAL strategy. c. for bernoulli method, it will first get the tupoffset with the help of function and then apply visibility check, it seems that could reduce the sample set way lower than expected in case lot of tuples are not visible, shouldn't it be done in reverse way(first visibility check, then call function to see if we want to include it in the sample)? I think something similar is done in acquire_sample_rows which seems right to me. I don't think so, the way bernoulli works is that it returns every tuple with equal probability, so the visible tuples have same chance of being returned as the invisible ones so the issue should be smoothed away automatically (IMHO). How will it get smoothen for cases when let us say more than 50% of tuples are not visible. I think its due to Postgres non-overwritting storage architecture that we maintain multiple version of rows in same heap, otherwise for different kind of architecture (like mysql/oracle) where multiple row versions are not maintained in same heap, the same function (with same percentage) can return
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
On 01/28/2015 02:47 AM, Michael Paquier wrote: On Wed, Jan 28, 2015 at 3:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: So I'm fine with taking out both this documentation text and the dead null-pointer checks; but let's do that all in one patch not piecemeal. In any case, this is just cosmetic cleanup; there's no actual hazard here. Attached is a patch with all those things done. Thanks, applied. I added as well an assertion in gistKeyIsEQ checking if the input datums are NULL. I believe that this is still useful for developers, feel free to rip it out from the patch if you think otherwise. I ripped it out because I think was wrong. It assumed that the input Datums are pass-by-reference, which is not a given. It looks that's true for all the current opclasses, so I wouldn't be surprised if there are hidden assumptions on that elsewhere in the code, but it was wrong nevertheless. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
On 27 January 2015 at 22:45, Stephen Frost sfr...@snowman.net wrote: Here's the latest set with a few additional improvements (mostly comments but also a couple missed #include's and eliminating unnecessary whitespace changes). Unless there are issues with my testing tonight or concerns raised, I'll push these tomorrow. I spotted a couple of minor things reading the patches: - There's a typo in the comment for the GetModifiedColumns() macros (...stick in into...). - The new regression test is not tidying up properly after itself, because it's trying to drop the table t1 as the wrong user. Other than that, this looks reasonable to me, and I think that for most common situations it won't reduce the detail in errors. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
On 2015/01/19 17:10, Etsuro Fujita wrote: Attached is an updated version of the patch. I'll add this to the next CF. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-26 21:13:31 -0500, Robert Haas wrote: On Mon, Jan 26, 2015 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote: Contrary opinions? Robert? I'm totally OK with further aligning just that one allocation. Of course, now that I think about it, aligning it probably works mostly because the size is almost exactly one cache line. Not just almost - it's precisely 64 byte on x86-64 linux. And I think it'll also be that on other 64bit platforms. The only architecture dependant thing in there is the spinlock and that already can be between 1 and 4 bytes without changing the size of the struct. If it were any bigger or smaller, aligning it more wouldn't help. Yea. So maybe we should also do something like what LWLocks do, and make a union between the actual structure and an appropriate array of padding bytes - say either 64 or 128 of them. Hm. That's a bit bigger patch. I'm inclined to just let it slide for the moment. I still have plans to downsize some of sbufdesc's content (move the io lock out) and move the content lwlock inline. Then we're not going to have much choice but do this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 28/01/15 08:23, Kyotaro HORIGUCHI wrote: Hi, I took a look on this and found nice. By the way, the parameter for REPEATABLE seems allowing to be a expression in ParseTableSample but the grammer rejects it. It wasn't my intention to support it, but you are correct, the code is generic enough that we can support it. The following change seems enough. Seems about right, thanks. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: What bothers me about this is that it punts SSL work to the application and requires that they be coded to work with both OpenSSL and whatever else we implement (eg: GnuTLS) to do anything but the most simple checks. That's a problem because people are *not* going to want to #include both OpenSSL and GnuTLS headers into their applications because they don't know which PG will be compiled with.. Not to mention that it'd be darn awkward to do so. The point of this is to provide an escape hatch for people who really want to do XYZ even though we provide no API for XYZ in libpq. Hopefully, those people will be few and far between, because anything that's a really common requirement should be catered for by libpq. I understand that, but 4 variables is pretty darn far from what an application developing for SSL is going to want. As I've mentioned before when this has been brought up, I'm of the opinion that we should be providing, from the start, the same set as Apache's SSL environment variables: The mod_ssl (OpenSSL-based) documentation: http://httpd.apache.org/docs/2.2/mod/mod_ssl.html For mod_gnutls, this is the list of SSL variables provided: http://www.outoforder.cc/projects/apache/mod_gnutls/docs/#environment-variables Note that they're pretty much the same set, so providing them for OpenSSL isn't closing off the ability to provide GnuTLS in the future. To be clear, I'm not asking for all of this to happen in the first patch, but I'd like whomever is going forward with this to at least agree that they're going to try and cover the Apache set for whatever libraries are supported in the first major release we put out with this. Considering the example is already there, I'm really hopeful that isn't too difficult to do.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Noah Misch n...@leadboat.com writes: On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: So at this point I propose that we reject \u when de-escaping JSON. I would have agreed on 2014-12-09, and this release is the last chance to make such a change. It is a bold wager that could pay off, but -1 from me anyway. You only get to vote -1 if you have a credible alternative. I don't see one. I can already envision the blog post from the DBA staying on 9.4.0 because 9.4.1 pulled his ability to store U+ in jsonb. Those will be more or less the same people who bitch about text not storing NULs; the world has not fallen. jsonb was *the* top-billed 9.4 feature, and this thread started with Andrew conveying a field report of a scenario more obscure than storing U+. There is a separate issue, which is that our earlier attempt to make the world safe for \u actually broke other things. We still need to fix that, but I think the fix probably consists of reverting that patch and instead disallowing \u. Anybody who's seriously unhappy with that can propose a patch to fix it properly in 9.5 or later. Someone can still do that by introducing a V2 of the jsonb binary format and preserving the ability to read both formats. (Too bad Andres's proposal to include a format version didn't inform the final format, but we can wing it.) I agree that storing U+ as 0x00 is the best end state. We will not need a v2 format, merely values that contain NULs. Existing data containing \u will be read as those six ASCII characters, which is not really wrong since that might well be what it was anyway. We probably need to rethink the re-escaping behavior as well; I'm not sure if your latest patch is the right answer for that. Yes, we do. No change to the representation of U+ is going to fix the following bug, but that patch does fix it: [local] test=# select test-# $$\\u05e2$$::jsonb = $$\\u05e2$$::jsonb, test-# $$\\u05e2$$::jsonb = $$\\u05e2$$::jsonb::text::jsonb; The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, which I'm inclined to think we need to simply revert, not render even more squirrely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 01/23/2015 02:34 AM, Petr Jelinek wrote: On 22/01/15 17:02, Petr Jelinek wrote: The new version (the one that is not submitted yet) of gapless sequence is way more ugly and probably not best example either but does guarantee gaplessness (it stores the last value in it's own value table). So I am not sure if it should be included either... Here it is as promised. I generally like the division of labour between the seqam implementations and the API now. I don't like the default_sequenceam GUC. That seems likely to create confusion. And it's not really enough for something like a remote sequence AM anyway that definitely needs some extra reloptions, like the hostname of the remote server. The default should be 'local', unless you specify something else with CREATE SEQUENCE USING - no GUCs. Some comments on pg_dump: * In pg_dump's dumpSequence function, check the server version number instead of checking whether pg_sequence_dump_state() function exists. That's what we usually do when new features are introduced. And there's actually a bug there: you have the check backwards. (try dumping a database with any sequences in it; it fails) * pg_dump should not output a USING clause when a sequence uses the 'local' sequence. No point in adding such noise - making the SQL command not standard-compatible - for the 99% of databases that don't use other sequence AMs. * Be careful to escape all strings correctly in pg_dump. I think you're missing escaping for the 'state' variable at least. In sequence_save_tuple: else { /* * New tuple was not sent, so the original tuple was probably just * changed inline, all we need to do is mark the buffer dirty and * optionally log the update tuple. */ START_CRIT_SECTION(); MarkBufferDirty(seqh-buf); if (do_wal) log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, page); END_CRIT_SECTION(); } This is wrong when checksums are enabled and !do_wal. I believe this should use MarkBufferDirtyHint(). Notable changes: - The gapless sequence rewritten to use the transactional storage as that's the only way to guarantee gaplessness between dump and restore. Can you elaborate? Using the auxiliary seqam_gapless_values is a bit problematic. First of all, the event trigger on DROP SEQUENCE doesn't fire if you drop the whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, updating a row in a table for every nextval() call is pretty darn expensive. But I don't actually see a problem with dump and restore. Can you rewrite it without using the values table? AFAICS, there are a two of problems to solve: 1. If the top-level transaction aborts, you need to restore the old value. I suggest keeping two values in the sequence tuple: the old, definitely committed one, and the last value. The last value is only considered current if the associated XID committed; otherwise the old value is current. When a transaction is about to commit, it stores its top-level XID and the new value in the last field, and copies the previously current value to the old field. 2. You need to track the last value on a per-subtransaction basis, until the transaction commits/rolls back, in order to have rollback to savepoint to retreat the sequence's value. That can be done in backend-private memory, maintaining a stack of subtransactions and the last value of each. Use RegisterSubXactCallback to hook into subxact commit/abort. Just before top-level commit (in pre-commit callback), update the sequence tuple with the latest value, so that it gets WAL-logged before the commit record. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/28/2015 12:50 AM, Noah Misch wrote: On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the representation. So at this point I propose that we reject \u when de-escaping JSON. I would have agreed on 2014-12-09, and this release is the last chance to make such a change. It is a bold wager that could pay off, but -1 from me anyway. I can already envision the blog post from the DBA staying on 9.4.0 because 9.4.1 pulled his ability to store U+ in jsonb. jsonb was *the* top-billed 9.4 feature, and this thread started with Andrew conveying a field report of a scenario more obscure than storing U+. Therefore, we have to assume many users will notice the change. This move would also add to the growing evidence that our .0 releases are really beta(N+1) releases in disguise. Anybody who's seriously unhappy with that can propose a patch to fix it properly in 9.5 or later. Someone can still do that by introducing a V2 of the jsonb binary format and preserving the ability to read both formats. (Too bad Andres's proposal to include a format version didn't inform the final format, but we can wing it.) I agree that storing U+ as 0x00 is the best end state. We need to make up our minds about this pretty quickly. The more radical move is likely to involve quite a bit of work, ISTM. It's not clear to me how we should represent a unicode null. i.e. given a json of '[foo\ubar]', I get that we'd store the element as 'foo\x00bar', but what is the result of (jsonb '[foo\ubar')-0 It's defined to be text so we can't just shove a binary null in the middle of it. Do we throw an error? And I still want to hear more voices on the whole direction we want to take this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On 01/28/2015 06:58 PM, Stephen Frost wrote: Although I think OpenSSL SSL is a little bit duplicatively redundant. Why not just OpenSSL? I wondered also, but figured it was probably because it's OpenSSL's ssl structure, which then made sense. Right, that was the idea. I wanted it to include the word OpenSSL, to make it clear in the callers that it's specific to OpenSSL. And SSL, because that's the name of the struct. I agree it looks silly, though. One idea is to have two separate arguments: the implementation name, and the struct name. PQgetSSLstruct(ssl, OpenSSL, SSL) would look less silly. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Andrew Dunstan and...@dunslane.net writes: It's not clear to me how we should represent a unicode null. i.e. given a json of '[foo\ubar]', I get that we'd store the element as 'foo\x00bar', but what is the result of (jsonb '[foo\ubar')-0 It's defined to be text so we can't just shove a binary null in the middle of it. Do we throw an error? Yes, that is what I was proposing upthread. Obviously, this needs some thought to ensure that there's *something* useful you can do with a field containing a nul, but we'd have little choice but to throw an error if the user asks us to convert such a field to unescaped text. I'd be a bit inclined to reject nuls in object field names even if we allow them in field values, since just about everything you can usefully do with a field name involves regarding it as text. Another interesting implementation problem is what does indexing do with such values --- ISTR there's an implicit conversion to C strings in there too, at least in GIN indexes. Anyway, there is a significant amount of work involved here, and there's no way we're getting it done for 9.4.1, or probably 9.4.anything. I think our only realistic choice right now is to throw error for \u so that we can preserve our options for doing something useful with it later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Jeff Davis pg...@j-davis.com writes: On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Uh, sorry, I've not been paying any attention to this thread for awhile. What's the remaining questions at issue? This patch is trying to improve the array_agg case where there are many arrays being constructed simultaneously, such as in HashAgg. You strongly suggested that a commitable patch would differentiate between the grouped and ungrouped cases (or perhaps you meant differentiate between HashAgg and sorted aggregation?). Tomas's current patch does not do so; it does two main things: 1. Uses a common memory context for arrays being constructed by array_agg in any context (ungrouped, sorted agg, and HashAgg) 2. Reduces the initial array allocation to 8 elements from 64, but preserves doubling behavior Sorry for slow response on this. I took a look at the v8 patch (that's still latest no?). I see that it only gets rid of the separate context for the two callers array_agg_transfn and array_agg_array_transfn, for which no memory release can happen anyway until the parent context is reset (cf comments in corresponding finalfns). So that's fine --- it is not making any bloat case any worse, at least. I'm not sure about whether reducing the initial Datum array size across-the-board is a good thing or not. One obvious hack to avoid unexpected side effects on other callers would be astate-alen = (subcontext ? 64 : 8); The larger initial size is basically free when subcontext is true, anyway, considering it will be coming out of an 8K initial subcontext allocation. This still leaves us wondering whether the smaller initial size will hurt array_agg itself, but that's at least capable of being tested with a reasonably small number of test cases. I strongly object to removing initArrayResultArr's element_type argument. That's got nothing to do with the stated purpose of the patch, and it forces a non-too-cheap catalog lookup to be done inside initArrayResultArr. It's true that some of the current call sites are just doing the same lookup themselves anyway, but we should not foreclose the possibility that the caller has the data already (as some others do) or is able to cache it across multiple uses. A possible compromise is to add the convention that callers can pass InvalidOid if they want initArrayResultArr to do the lookup. (In any case, the function's header comment needs adjustment if the API spec changes.) Other than that, I think the API changes here are OK. Adding a new argument to existing functions is something we do all the time, and it's clear how to modify any callers to get the same behavior as before. We could perhaps clean things up with a more invasive redefinition, but I doubt it's worth inflicting more pain on third party callers. Another thing I'd change is this: + /* we can only release the context if it's a private one. */ + Assert(! (release !astate-private_cxt)); + /* Clean up all the junk */ if (release) MemoryContextDelete(astate-mcontext); Why not this way: /* Clean up all the junk */ if (release) + { + Assert(astate-private_cxt); MemoryContextDelete(astate-mcontext); + } Seems a lot more understandable, and less code too. I concur with the concerns that the comments could do with more work, but haven't attempted to improve them myself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Jim Nasby jim.na...@bluetreble.com writes: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. If no one has bitched about this with text, is it really that big a problem with JSON? Oh, people have bitched about it all right ;-). But I think your real question is how many applications find that restriction to be fatal. The answer clearly is not a lot for text, and I don't see why the answer would be different for JSON. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
On 2015-01-28 15:32:15 +, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: ISTM that the check is just overzelous and/or needs to be moved into ImportSnapshot(). There it then could be made to check if the exporting xact was also deferrable. That would be great if ImportSnapshot had access to that information; I don't see it, though. Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. Then add that information? The disk format for snapshot isn't persistent across restarts, so we can just extend it. I really don't like adding hacks like using a lower serializability level than what's actually requested just because it happens to be easier. Even if it's just in some backend. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make hba available to client code
On 01/28/2015 04:26 PM, David Fetter wrote: On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: While investigating another project, namely adding pg_hba.conf support to pgbouncer, I ran into a stumbling block others probably will, too: the hba code is backend-only, which means that if I were to do this as-is, I would be cooking a batch of very unappetizing copypasta. I'm allergic to copypasta, so unless there are big objections, I'd like to export those functions to make hba available to other code. How exactly would exporting those functions help anything client-side? Right now, pgbouncer, and aspirational things like it--other connection poolers, maybe distributed transaction managers, etc.--can fairly easily act almost like a direct connection to PostgreSQL, except for some important exceptions. One that's cropped up several times is the ability to gate auth by network and user, that being what pg_hba.conf allows. A conversation with Andrew Dunstan since I posted convinced me that the amount of work to separate this cleanly and have it perform somewhere in the close range of as well as it does now could be pretty significant. I should add that I am working on an hba facility for pgbouncer. It's currently in trial. I did pull in a few pieces of the core hba code, but not a great deal - mainly to do with handling IP addresses and masks. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/28/15 9:56 AM, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: I thought the proposal to chunk on the basis of each worker processes one 1GB-sized segment should work all right. The kernel should see that as sequential reads of different files, issued by different processes; and if it can't figure out how to process that efficiently then it's a very sad excuse for a kernel. Agreed. I agree. But there's only value in doing something like that if we have evidence that it improves anything. Such evidence is presently a bit thin on the ground. You need an i/o subsystem that's fast enough to keep a single CPU busy, otherwise (as you mentioned elsewhere), you're just going to be i/o bound and having more processes isn't going to help (and could hurt). Such i/o systems do exist, but a single RAID5 group over spinning rust with a simple filter isn't going to cut it with a modern CPU- we're just too darn efficient to end up i/o bound in that case. A more complex filter might be able to change it over to being more CPU bound than i/o bound and produce the performance improvments you're looking for. Except we're nowhere near being IO efficient. The vast difference between Postgres IO rates and dd shows this. I suspect that's because we're not giving the OS a list of IO to perform while we're doing our thing, but that's just a guess. The caveat to this is if you have multiple i/o *channels* (which it looks like you don't in this case) where you can parallelize across those channels by having multiple processes involved. Keep in mind that multiple processes is in no way a requirement for that. Async IO would do that, or even just requesting stuff from the OS before we need it. We only support multiple i/o channels today with tablespaces and we can't span tables across tablespaces. That's a problem when working with large data sets, but I'm hopeful that this work will eventually lead to a parallelized Append node that operates against a partitioned/inheirited table to work across multiple tablespaces. Until we can get a single seqscan close to dd performance, I fear worrying about tablespaces and IO channels is entirely premature. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Kevin Grittner kgri...@ymail.com wrote: Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. It looks like the attached patch does it (although it is only lightly tested so far and only on the master branch). This seems like a back-patchable bug fix (to 9.3). Thoughts? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index c3ebb3a..65ccc93 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1011,7 +1011,7 @@ setup_connection(Archive *AH, DumpOptions *dopt, const char *dumpencoding, ExecuteSqlStatement(AH, BEGIN); if (AH-remoteVersion = 90100) { - if (dopt-serializable_deferrable) + if (dopt-serializable_deferrable AH-sync_snapshot_id == NULL) ExecuteSqlStatement(AH, SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump issue - push useless statements REVOKE, GRANT
Hi I have a dump with thousands objects.I found often pattern in dump, that has not any sense. These operations has zero sense, but it decrease a database restore. It is expected behave? REVOKE ALL ON TABLE zobjrozp FROM PUBLIC; REVOKE ALL ON TABLE zobjrozp FROM sitkhaso; GRANT ALL ON TABLE zobjrozp TO sitkhaso; GRANT ALL ON TABLE zobjrozp TO PUBLIC; REVOKE ALL ON TABLE zppolozky FROM PUBLIC; REVOKE ALL ON TABLE zppolozky FROM sitkhaso; GRANT ALL ON TABLE zppolozky TO sitkhaso; GRANT ALL ON TABLE zppolozky TO PUBLIC; ... 5000x Regards Pavel Stehule
Re: [HACKERS] Make hba available to client code
David Fetter da...@fetter.org writes: While investigating another project, namely adding pg_hba.conf support to pgbouncer, I ran into a stumbling block others probably will, too: the hba code is backend-only, which means that if I were to do this as-is, I would be cooking a batch of very unappetizing copypasta. I'm allergic to copypasta, so unless there are big objections, I'd like to export those functions to make hba available to other code. How exactly would exporting those functions help anything client-side? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 1/28/15 11:36 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: It's not clear to me how we should represent a unicode null. i.e. given a json of '[foo\ubar]', I get that we'd store the element as 'foo\x00bar', but what is the result of (jsonb '[foo\ubar')-0 It's defined to be text so we can't just shove a binary null in the middle of it. Do we throw an error? Yes, that is what I was proposing upthread. Obviously, this needs some thought to ensure that there's *something* useful you can do with a field containing a nul, but we'd have little choice but to throw an error if the user asks us to convert such a field to unescaped text. I'd be a bit inclined to reject nuls in object field names even if we allow them in field values, since just about everything you can usefully do with a field name involves regarding it as text. Another interesting implementation problem is what does indexing do with such values --- ISTR there's an implicit conversion to C strings in there too, at least in GIN indexes. Anyway, there is a significant amount of work involved here, and there's no way we're getting it done for 9.4.1, or probably 9.4.anything. I think our only realistic choice right now is to throw error for \u so that we can preserve our options for doing something useful with it later. My $0.01: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. If no one has bitched about this with text, is it really that big a problem with JSON? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make hba available to client code
David Fetter da...@fetter.org writes: On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote: How exactly would exporting those functions help anything client-side? Right now, pgbouncer, and aspirational things like it--other connection poolers, maybe distributed transaction managers, etc.--can fairly easily act almost like a direct connection to PostgreSQL, except for some important exceptions. One that's cropped up several times is the ability to gate auth by network and user, that being what pg_hba.conf allows. A conversation with Andrew Dunstan since I posted convinced me that the amount of work to separate this cleanly and have it perform somewhere in the close range of as well as it does now could be pretty significant. I'm not sure I understand what you mean by separate this cleanly, but if what you mean is rewrite hba.c so that it works either in frontend or backend, I don't think I'm going to like the result; and I'm not convinced that client-side code would find it all that useful either. The API, error handling, and memory management would probably all need to be a great deal different on client side. And serving two masters like that would result in an unreadable mess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Kevin Grittner kgri...@ymail.com writes: Kevin Grittner kgri...@ymail.com wrote: Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. It looks like the attached patch does it (although it is only lightly tested so far and only on the master branch). This seems like a back-patchable bug fix (to 9.3). Thoughts? A comment seems essential here, because as written anybody would think the test for a snapshot is a bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Andres Freund and...@2ndquadrant.com wrote: On 2015-01-28 15:32:15 +, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: ISTM that the check is just overzelous and/or needs to be moved into ImportSnapshot(). There it then could be made to check if the exporting xact was also deferrable. That would be great if ImportSnapshot had access to that information; I don't see it, though. Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. Then add that information? The disk format for snapshot isn't persistent across restarts, so we can just extend it. I really don't like adding hacks like using a lower serializability level than what's actually requested just because it happens to be easier. Even if it's just in some backend. I see your point, and will look at that for the master branch, but it hardly seems like something to back-patch; and the messy failure of this combination of options on 9.3 and 9.4 seems like it deserves a fix. Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Kevin Grittner kgri...@ymail.com wrote: Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. It looks like the attached patch does it (although it is only lightly tested so far and only on the master branch). This seems like a back-patchable bug fix (to 9.3). Thoughts? A comment seems essential here, because as written anybody would think the test for a snapshot is a bug. Good point. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On Mon, Jan 26, 2015 at 11:21 PM, Andreas Karlsson andr...@proxel.se wrote: Do you also think the SQL functions should be named numeric_int128_sum, numeric_int128_avg, etc? Some quick review comments. These apply to int128-agg-v5.patch. * Why is there no declaration of the function numeric_int16_stddev_internal() within numeric.c? * I concur with others - we should stick to int16 for the SQL interface. The inconsistency there is perhaps slightly confusing, but that's historic. * I'm not sure about the idea of polymorphic catalog functions (that return the type internal, but the actual struct returned varying based on build settings). I tend to think that things would be better if there was always a uniform return type for such internal type returning functions, but that *its* structure varied according to the availability of int128 (i.e. HAVE_INT128) at compile time. What we should probably do is have a third aggregate struct, that encapsulates this idea of (say) int2_accum() piggybacking on one of either Int128AggState or NumericAggState directly. Maybe that would be called PolyNumAggState. Then this common code is all that is needed on both types of build (at the top of int4_accum(), for example): PolyNumAggState *state; state = PG_ARGISNULL(0) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(0); I'm not sure if I'd do this with a containing struct or a simple #ifdef HAVE_INT128 typedef #else ... , but I think it's an improvement either way. * You didn't update this unequivocal comment to not be so strong: * Integer data types all use Numeric accumulators to share code and * avoid risk of overflow. That's all I have for now... -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Make hba available to client code
Folks, While investigating another project, namely adding pg_hba.conf support to pgbouncer, I ran into a stumbling block others probably will, too: the hba code is backend-only, which means that if I were to do this as-is, I would be cooking a batch of very unappetizing copypasta. I'm allergic to copypasta, so unless there are big objections, I'd like to export those functions to make hba available to other code. Objections? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
On 28.1.2015 21:28, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Uh, sorry, I've not been paying any attention to this thread for awhile. What's the remaining questions at issue? This patch is trying to improve the array_agg case where there are many arrays being constructed simultaneously, such as in HashAgg. You strongly suggested that a commitable patch would differentiate between the grouped and ungrouped cases (or perhaps you meant differentiate between HashAgg and sorted aggregation?). Tomas's current patch does not do so; it does two main things: 1. Uses a common memory context for arrays being constructed by array_agg in any context (ungrouped, sorted agg, and HashAgg) 2. Reduces the initial array allocation to 8 elements from 64, but preserves doubling behavior Sorry for slow response on this. I took a look at the v8 patch (that's still latest no?). I see that it only gets rid of the Yes, v8 is the latest version I submitted. separate context for the two callers array_agg_transfn and array_agg_array_transfn, for which no memory release can happen anyway until the parent context is reset (cf comments in corresponding finalfns). So that's fine --- it is not making any bloat case any worse, at least. I'm not sure about whether reducing the initial Datum array size across-the-board is a good thing or not. One obvious hack to avoid unexpected side effects on other callers would be astate-alen = (subcontext ? 64 : 8); The larger initial size is basically free when subcontext is true, anyway, considering it will be coming out of an 8K initial subcontext allocation. Seems like a good idea. If we're using 8kB contexts, we can preallocate 64 elements right away. But maybe we could decrease the 8kB context size to 1kB? For 64 elements is just 512B + 64B for the arrays, and a bit of space for the ArrayBuildState. So maybe ~600B in total. Of course, this does not include space for pass-by-ref values. Anyway, I have no problem with this - I mostly care just about the array_agg() case. All the other places may adopt the 'common context' approach to get the benefit. This still leaves us wondering whether the smaller initial size will hurt array_agg itself, but that's at least capable of being tested with a reasonably small number of test cases. I plan to do more testing to confirm this - my initial testing seemed to confirm this, but I'll repeat that with the current patch. I strongly object to removing initArrayResultArr's element_type argument. That's got nothing to do with the stated purpose of the patch, and it forces a non-too-cheap catalog lookup to be done inside initArrayResultArr. It's true that some of the current call sites are just doing the same lookup themselves anyway, but we should not foreclose the possibility that the caller has the data already (as some others do) or is able to cache it across multiple uses. A possible compromise is to add the convention that callers can pass InvalidOid if they want initArrayResultArr to do the lookup. (In any case, the function's header comment needs adjustment if the API spec changes.) Fair point, and the InvalidOid approach seems sane to me. Other than that, I think the API changes here are OK. Adding a new argument to existing functions is something we do all the time, and it's clear how to modify any callers to get the same behavior as before. We could perhaps clean things up with a more invasive redefinition, but I doubt it's worth inflicting more pain on third party callers. Another thing I'd change is this: + /* we can only release the context if it's a private one. */ + Assert(! (release !astate-private_cxt)); + /* Clean up all the junk */ if (release) MemoryContextDelete(astate-mcontext); Why not this way: /* Clean up all the junk */ if (release) + { + Assert(astate-private_cxt); MemoryContextDelete(astate-mcontext); + } Seems a lot more understandable, and less code too. Yeah, I agree it's easier to understand. I concur with the concerns that the comments could do with more work, but haven't attempted to improve them myself. There were a few comments about this, after the v8 patch, with recommended comment changes. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make hba available to client code
On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: While investigating another project, namely adding pg_hba.conf support to pgbouncer, I ran into a stumbling block others probably will, too: the hba code is backend-only, which means that if I were to do this as-is, I would be cooking a batch of very unappetizing copypasta. I'm allergic to copypasta, so unless there are big objections, I'd like to export those functions to make hba available to other code. How exactly would exporting those functions help anything client-side? Right now, pgbouncer, and aspirational things like it--other connection poolers, maybe distributed transaction managers, etc.--can fairly easily act almost like a direct connection to PostgreSQL, except for some important exceptions. One that's cropped up several times is the ability to gate auth by network and user, that being what pg_hba.conf allows. A conversation with Andrew Dunstan since I posted convinced me that the amount of work to separate this cleanly and have it perform somewhere in the close range of as well as it does now could be pretty significant. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v7
Hi Marco, 2015-01-27 19:04 GMT+01:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : I've done some test and it looks like that FSM nodes always have InvalidXLogRecPtr as LSN. Ive updated the patch to always include files if all their pages have LSN == InvalidXLogRecPtr Updated patch v7 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it I've tried again to replay a new test of the incremental backup introducing a new tablespace after a base backup, considering the version 7 of the patch and the new version of the restore script attached in http://www.postgresql.org/message-id/54c7cdad.6060...@2ndquadrant.it: # define here your work dir WORK_DIR='/home/gbroccolo/pgsql' # preliminary steps rm -rf /tmp/data /tmp/tbls tbls/ backups/ # create a test db and a backup repository psql -c DROP DATABASE IF EXISTS pgbench psql -c CREATE DATABASE pgbench pgbench -U postgres -i -s 5 -F 80 pgbench mkdir -p backups # a first base backup with pg_basebackup BASE=$(mkdir -vp backups/$(date '+%d%m%y%H%M') | awk -F'[’‘]' '{print $2}') echo start a base backup: $BASE mkdir -vp $BASE/data pg_basebackup -v -F p -D $BASE/data -x -c fast # creation of a new tablespace, alter the table pgbench_accounts to set the new tablespace mkdir -p $WORK_DIR/tbls CREATE_CMD=CREATE TABLESPACE tbls LOCATION '$WORK_DIR/tbls' psql -c $CREATE_CMD psql -c ALTER TABLE pgbench_accounts SET TABLESPACE tbls pgbench # Doing some work on the database pgbench -U postgres -T 120 pgbench # a second incremental backup with pg_basebackup specifying the new location for the tablespace through the tablespace mapping INCREMENTAL=$(mkdir -vp backups/$(date '+%d%m%y%H%M') | awk -F'[’‘]' '{print $2}') echo start an incremental backup: $INCREMENTAL mkdir -vp $INCREMENTAL/data $INCREMENTAL/tbls pg_basebackup -v -F p -D $INCREMENTAL/data -x -I $BASE/data -T $WORK_DIR/tbls=$WORK_DIR/$INCREMENTAL/tbls -c fast # restore the database ./pg_restorebackup.py -T $WORK_DIR/$INCREMENTAL/tbls=/tmp/tbls /tmp/data $BASE/data $INCREMENTAL/data chmod 0700 /tmp/data/ echo port= /tmp/data/postgresql.conf pg_ctl -D /tmp/data start now the restore works fine and pointing to tablespaces are preserved also in the restored instance: gbroccolo@arnold:~/pgsql (master %)$ psql -c \db+ List of tablespaces Name| Owner | Location | Access privileges | Options | Size | Description +--++---+-++- pg_default | postgres || | | 37 MB | pg_global | postgres || | | 437 kB | tbls | postgres | /home/gbroccolo/pgsql/tbls | | | 80 MB | (3 rows) gbroccolo@arnold:~/pgsql (master %)$ psql -p -c \db+ List of tablespaces Name| Owner | Location | Access privileges | Options | Size | Description +--+---+---+-++- pg_default | postgres | | | | 37 MB | pg_global | postgres | | | | 437 kB | tbls | postgres | /tmp/tbls | | | 80 MB | (3 rows) Thanks Marco for your reply. Giuseppe. -- Giuseppe Broccolo - 2ndQuadrant Italy PostgreSQL Training, Services and Support giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it
Re: [HACKERS] Parallel Seq Scan
Robert Haas robertmh...@gmail.com writes: The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. I thought the proposal to chunk on the basis of each worker processes one 1GB-sized segment should work all right. The kernel should see that as sequential reads of different files, issued by different processes; and if it can't figure out how to process that efficiently then it's a very sad excuse for a kernel. You are right that trying to do any detailed I/O scheduling by ourselves is a doomed exercise. For better or worse, we have kept ourselves at sufficient remove from the hardware that we can't possibly do that successfully. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. I thought the proposal to chunk on the basis of each worker processes one 1GB-sized segment should work all right. The kernel should see that as sequential reads of different files, issued by different processes; and if it can't figure out how to process that efficiently then it's a very sad excuse for a kernel. I agree. But there's only value in doing something like that if we have evidence that it improves anything. Such evidence is presently a bit thin on the ground. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 7:46 AM, Robert Haas robertmh...@gmail.com wrote: All that aside, I still can't account for the numbers you are seeing. When I run with your patch and what I think is your test case, I get different (slower) numbers. And even if we've got 6 drives cranking along at 400MB/s each, that's still only 2.4 GB/s, not 6 GB/s. So I'm still perplexed. I have tried the tests again and found that I have forgotten to increase max_worker_processes due to which the data is so different. Basically at higher client count it is just scanning lesser number of blocks in fixed chunk approach. So today I again tried with changing max_worker_processes and found that there is not much difference in performance at higher client count. I will take some more data for both block_by_block and fixed_chunk approach and repost the data. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Wed, Jan 28, 2015 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-01-26 21:13:31 -0500, Robert Haas wrote: So maybe we should also do something like what LWLocks do, and make a union between the actual structure and an appropriate array of padding bytes - say either 64 or 128 of them. Hm. That's a bit bigger patch. I'm inclined to just let it slide for the moment. I still have plans to downsize some of sbufdesc's content (move the io lock out) and move the content lwlock inline. Then we're not going to have much choice but do this... Even if you didn't have plans like that, it would be entire folly to imagine that buffer headers will be exactly 64 bytes without some forcing function for that. Accordingly, I vote against applying any patch that pretends to improve their alignment unless it also does something to ensure that the size is a power of 2. Any notational changes that are forced by that would be much better done in a patch that does only that than in a patch that also makes functional changes to the header contents. I entirely agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
On 2015-01-28 14:54:15 +, Kevin Grittner wrote: Kevin Grittner kgri...@ymail.com wrote: Alexander Korotkov aekorot...@gmail.com wrote: Could we start snapshot-importing transaction with repeatable read isolation level? You can if you don't use the option which specifies that you want serializable behavior. Why specify --serializable-deferrable if you don't? AFAICS, they should read exactly same data as snapshot-exporting serializable transaction. Sort of. The behavior once they have a snapshot and are running is the same; the difference is whether the snapshot can see a transient state which would not be consistent with some serial order of transaction execution. Oh, wait; on a re-read I think I may have misunderstood the question. If you are talking about having pg_dump acquire a safe snapshot and have cooperating processes in the same pg_dump run use that snapshot in repeatable read transactions, then yes -- that would work. As long as a repeatable read transaction is using a safe snapshot it will not see any anomalies. That would be a better solution if it can be done. Do you have any code to suggest, or should I look at writing it? Why could it be unsafe to import a snapshot that's been generated as serializable deferrable into another backend? Doesn't the fact that it has been exported out of a deferrable xact that's still running pretty much guarantee that the other xact is also safe? ISTM that the check is just overzelous and/or needs to be moved into ImportSnapshot(). There it then could be made to check if the exporting xact was also deferrable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Andres Freund and...@2ndquadrant.com wrote: Kevin Grittner kgri...@ymail.com wrote: Alexander Korotkov aekorot...@gmail.com wrote: Could we start snapshot-importing transaction with repeatable read isolation level? If you are talking about having pg_dump acquire a safe snapshot and have cooperating processes in the same pg_dump run use that snapshot in repeatable read transactions, then yes -- that would work. As long as a repeatable read transaction is using a safe snapshot it will not see any anomalies. Why could it be unsafe to import a snapshot that's been generated as serializable deferrable into another backend? That wouldn't be unsafe, which is what I was saying the post you responded to. Doesn't the fact that it has been exported out of a deferrable xact that's still running pretty much guarantee that the other xact is also safe? As long as it is at least repeatable read, yes. ISTM that the check is just overzelous and/or needs to be moved into ImportSnapshot(). There it then could be made to check if the exporting xact was also deferrable. That would be great if ImportSnapshot had access to that information; I don't see it, though. Having pg_dump use repeatable read transactions for the processes that import the snapshot would work fine, as long as they are reading a snapshot which was captured by a serializable read only deferrable transaction. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-28 10:35:28 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-01-26 21:13:31 -0500, Robert Haas wrote: So maybe we should also do something like what LWLocks do, and make a union between the actual structure and an appropriate array of padding bytes - say either 64 or 128 of them. Hm. That's a bit bigger patch. I'm inclined to just let it slide for the moment. I still have plans to downsize some of sbufdesc's content (move the io lock out) and move the content lwlock inline. Then we're not going to have much choice but do this... Even if you didn't have plans like that, it would be entire folly to imagine that buffer headers will be exactly 64 bytes without some forcing function for that. Meh. The 128 byte additionally used by the alignment don't hurt in any case. But forcing all buffer descriptors to 64bit on a 32bit platform isn't guaranteed to be performance neutral. So, no I don't think it's a folly to do so. I'd rather make actual progress that improves the absurd situation today (a factor of 2-3 by changing max_connections by one...) than argue whether the impact on 32bit platforms is acceptable before doing so. If we additionally decide to pad the struct, fine. But I don't see why this has to be done at the same time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
* Stephen Frost (sfr...@snowman.net) wrote: Such i/o systems do exist, but a single RAID5 group over spinning rust with a simple filter isn't going to cut it with a modern CPU- we're just too darn efficient to end up i/o bound in that case. err, to *not* end up i/o bound. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Kevin Grittner kgri...@ymail.com wrote: Alexander Korotkov aekorot...@gmail.com wrote: Could we start snapshot-importing transaction with repeatable read isolation level? You can if you don't use the option which specifies that you want serializable behavior. Why specify --serializable-deferrable if you don't? AFAICS, they should read exactly same data as snapshot-exporting serializable transaction. Sort of. The behavior once they have a snapshot and are running is the same; the difference is whether the snapshot can see a transient state which would not be consistent with some serial order of transaction execution. Oh, wait; on a re-read I think I may have misunderstood the question. If you are talking about having pg_dump acquire a safe snapshot and have cooperating processes in the same pg_dump run use that snapshot in repeatable read transactions, then yes -- that would work. As long as a repeatable read transaction is using a safe snapshot it will not see any anomalies. That would be a better solution if it can be done. Do you have any code to suggest, or should I look at writing it? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
Andres Freund and...@2ndquadrant.com writes: On 2015-01-26 21:13:31 -0500, Robert Haas wrote: So maybe we should also do something like what LWLocks do, and make a union between the actual structure and an appropriate array of padding bytes - say either 64 or 128 of them. Hm. That's a bit bigger patch. I'm inclined to just let it slide for the moment. I still have plans to downsize some of sbufdesc's content (move the io lock out) and move the content lwlock inline. Then we're not going to have much choice but do this... Even if you didn't have plans like that, it would be entire folly to imagine that buffer headers will be exactly 64 bytes without some forcing function for that. Accordingly, I vote against applying any patch that pretends to improve their alignment unless it also does something to ensure that the size is a power of 2. Any notational changes that are forced by that would be much better done in a patch that does only that than in a patch that also makes functional changes to the header contents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
Bruce, Stephen, etc.: So, I did a test trial of this and it seems like it didn't solve the issue of huge rsyncs. That is, the only reason to do this whole business via rsync, instead of doing a new basebackup of each replica, is to cut down on data transfer time by not resyncing the data from the old base directory. But in practice, the majority of the database files seem like they get transmitted anyway. Maybe I'm misreading the rsync ouput? Here's the setup: 3 Ubuntu 14.04 servers on AWS (tiny instance) Running PostgreSQL 9.3.5 Set up in cascading replication 108 -- 107 -- 109 The goal was to test this with cascading, but I didn't get that far. I set up a pgbench workload, read-write on the master and read-only on the two replicas, to simulate a load-balanced workload. I was *not* logging hint bits. I then followed this sequence: 1) Install 9.4 packages on all servers. 2) Shut down the master. 3) pg_upgrade the master using --link 4) shut down replica 107 5) rsync the master's $PGDATA from the replica: rsync -aHv --size-only -e ssh --itemize-changes 172.31.4.108:/var/lib/postgresql/ /var/lib/postgresql/ ... and got: .d..t.. 9.4/main/pg_xlog/ f+ 9.4/main/pg_xlog/0007000100CB .d..t.. 9.4/main/pg_xlog/archive_status/ sent 126892 bytes received 408645000 bytes 7640596.11 bytes/sec total size is 671135675 speedup is 1.64 So that's 390MB of data transfer. If I look at the original directory: postgres@paul: du --max-depth=1 -h 4.0K./.cache 20K ./.ssh 424M./9.3 4.0K./.emacs.d 51M ./9.4 56K ./bench 474M. So 390MB were transferred out of a possible 474MB. That certainly seems like we're still transferring the majority of the data, even though I verified that the hard links are being sent as hard links. No? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Hi, attached is v9 of the patch, modified along the lines of Tom's comments: 1) uses alen=64 for cases with private context, 8 otherwise 2) reverts removal of element_type from initArrayResultArr() When element_type=InvalidOid is passed to initArrayResultArr, it performs lookup using get_element_type(), otherwise reuses the value it receives from the caller. 3) moves the assert into the 'if (release)' branch 4) includes the comments proposed by Ali Akbar in his reviews Warnings at makeArrayResult/makeMdArrayResult about freeing memory with private subcontexts. Regarding the performance impact of decreasing the size of the preallocated array from 64 to just 8 elements, I tried this. CREATE TABLE test AS SELECT mod(i,10) a, i FROM generate_series(1,64*10) s(i); SELECT a, array_agg(i) AS x FRM test GROUP BY 1; or actually (to minimize transfer overhead): SELECT COUNT(x) FROM ( SELECT a, array_agg(i) AS x FRM test GROUP BY 1 ) foo; with work_mem=2GB (so that it really uses HashAggregate). The dataset is constructed to have exactly 64 items per group, thus exploiting the difference between alen=8 and alen=64. With alen=8 I get these timings: Time: 1892,681 ms Time: 1879,046 ms Time: 1892,626 ms Time: 1892,155 ms Time: 1880,282 ms Time: 1868,344 ms Time: 1873,294 ms and with alen=64: Time: 1888,244 ms Time: 1882,991 ms Time: 1885,157 ms Time: 1868,935 ms Time: 1878,053 ms Time: 1894,871 ms Time: 1871,571 ms That's 1880 vs 1882 on average, so pretty much no difference. Would be nice if someone else could try this on their machine(s). regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index f3ce1d7..9eb4d63 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -262,7 +262,7 @@ ExecScanSubPlan(SubPlanState *node, /* Initialize ArrayBuildStateAny in caller's context, if needed */ if (subLinkType == ARRAY_SUBLINK) astate = initArrayResultAny(subplan-firstColType, - CurrentMemoryContext); + CurrentMemoryContext, true); /* * We are probably in a short-lived expression-evaluation context. Switch @@ -964,7 +964,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) /* Initialize ArrayBuildStateAny in caller's context, if needed */ if (subLinkType == ARRAY_SUBLINK) astate = initArrayResultAny(subplan-firstColType, - CurrentMemoryContext); + CurrentMemoryContext, true); /* * Must switch to per-query memory context. diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c index 600646e..49fc23a 100644 --- a/src/backend/utils/adt/array_userfuncs.c +++ b/src/backend/utils/adt/array_userfuncs.c @@ -498,8 +498,13 @@ array_agg_transfn(PG_FUNCTION_ARGS) elog(ERROR, array_agg_transfn called in non-aggregate context); } - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0); + if (PG_ARGISNULL(0)) + state = initArrayResult(arg1_typeid, aggcontext, false); + else + state = (ArrayBuildState *) PG_GETARG_POINTER(0); + elem = PG_ARGISNULL(1) ? (Datum) 0 : PG_GETARG_DATUM(1); + state = accumArrayResult(state, elem, PG_ARGISNULL(1), @@ -573,7 +578,12 @@ array_agg_array_transfn(PG_FUNCTION_ARGS) elog(ERROR, array_agg_array_transfn called in non-aggregate context); } - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0); + + if (PG_ARGISNULL(0)) + state = initArrayResultArr(arg1_typeid, InvalidOid, aggcontext, false); + else + state = (ArrayBuildStateArr *) PG_GETARG_POINTER(0); + state = accumArrayResultArr(state, PG_GETARG_DATUM(1), PG_ARGISNULL(1), diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 5591b46..b8c4fba 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -4606,6 +4606,7 @@ array_insert_slice(ArrayType *destArray, * * element_type is the array element type (must be a valid array element type) * rcontext is where to keep working state + * subcontext is a flag determining whether to use a separate memory context * * Note: there are two common schemes for using accumArrayResult(). * In the older scheme, you start with a NULL ArrayBuildState pointer, and @@ -4615,24 +4616,43 @@ array_insert_slice(ArrayType *destArray, * once per element. In this scheme you always end with a non-NULL pointer * that you can pass to makeArrayResult; you get an empty array if there * were no elements. This is preferred if an empty array is what you want. + * + * It's possible to choose whether to create a separate memory context for the + * array builde state, or whether to allocate it directly within rcontext + * (along with various other pieces). This influences memory
Re: [HACKERS] pg_upgrade and rsync
On 01/28/2015 02:10 PM, Josh Berkus wrote: So 390MB were transferred out of a possible 474MB. That certainly seems like we're still transferring the majority of the data, even though I verified that the hard links are being sent as hard links. No? Looks like the majority of that was pg_xlog. Going to tear this down and start over, and --exclude pg_xlog. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 1/28/15 12:46 AM, Haribabu Kommi wrote: Also, what happens if someone reloads the config in the middle of running the SRF? hba entries are reloaded only in postmaster process, not in every backend. So there shouldn't be any problem with config file reload. Am i missing something? Ahh, good point. That does raise another issue though... there might well be some confusion about that. I think people are used to the varieties of how settings come into effect that perhaps this isn't an issue with pg_settings, but it's probably worth at least mentioning in the docs for a pg_hba view. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compiler warnings in copy.c
* Andres Freund (and...@2ndquadrant.com) wrote: On 2015-01-28 15:05:11 -0500, Stephen Frost wrote: Also, you seem to have pushed these commits with a date more than two weeks in the past. Please don't do that! Oh, wow, sorry about that. I had expected a rebase to update the date. It updates the committer, but not the author date. Use --pretty=fuller to see all the details. You can pass rebase --ignore-date to also reset the author date. Or commit --amend --reset Thanks for the clarification as to what was happening. I've modified my aliases to use 'git am --ignore-date' which appears to have fixed this. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Possible typo in create_policy.sgml
* Robert Haas (robertmh...@gmail.com) wrote: On Fri, Jan 9, 2015 at 3:46 PM, Stephen Frost sfr...@snowman.net wrote: A policy permits SELECT, INSERT, UPDATE or DELETE commands to access rows in a table that has row level security enabled. Access to existing table rows is granted if they match a policy expression specified via USING, while new rows that would be created via INSERT or UPDATE are checked against policy expressions specified via WITH CHECK. For policy expressions specified via USING which grant access to existing rows, the system will generally test the policy expressions prior to any qualifications that appear in the query itself, in order to the prevent the inadvertent exposure of the protected data to user-defined functions which might not be trustworthy. However, functions and operators marked by the system (or the system administrator) as LEAKPROOF may be evaluated before policy expressions, as they are assumed to be trustworthy. I think that sticking while new rows that would be created via INSERT or UPDATE are checked against policy expressions specified via WITH CHECK into the middle of this is horribly confusing, as it's a completely separate mechanism from the rest of what's being discussed here. I think there needs to be some initial language that clarifies that USING expressions apply to old rows and WITH CHECK expressions to new rows, and then you can go into more detail. But mentioning WITH CHECK parenthetically in the middle of the rest of this I think will not lead to clarity. I agree, especially after going back and re-reading this while fixing the issue mentioned earlier by Peter (which was an orthogonal complaint about the shadowing of WITH CHECK by USING, if WITH CHECK isn't specified). We really need a paragraph on USING policies and another on WITH CHECK policies. How about a reword along these lines: When row level security is enabled on a table, all access to that table by users other than the owner or a superuser must be through a policy. This requirement applies to both selecting out existing rows from the table and to adding rows to the table (through either INSERT or UPDATE). Granting access to existing rows in a table is done by specifying a USING expression which will be added to queries which reference the table. Every row in the table which a USING expression returns true will be visible. Granting access to add rows to a table is done by specifying a WITH CHECK expressison. A WITH CHECK expression must return true for every row being added to the table or an error will be returned and the command will be aborted. For policy expressions specified via USING which grant access to existing rows, the system will generally test the policy expressions prior to any qualifications that appear in the query itself, in order to the prevent the inadvertent exposure of the protected data to user-defined functions which might not be trustworthy. However, functions and operators marked by the system (or the system administrator) as LEAKPROOF may be evaluated before policy expressions, as they are assumed to be trustworthy. Thoughts? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Small bug on CREATE EXTENSION pgq...
* David Johnston (david.g.johns...@gmail.com) wrote: On Wednesday, January 28, 2015, Stephen Frost sfr...@snowman.net wrote: Ehh.. Shouldn't we try to take a bit more care that we reset things after a CREATE EXTENSION is run? Not really sure how much effort we want to put into it, but I see a bit of blame on both sides here. Fair enough but reset to what? I don't know the internal mechanics but if the session default is warning and a local change sets it to notice then an unconditional reset would not get us back to the intended value. Yeah, we'd really want to reset it to what it was before. Now, if extensions can be handled similarly to how functions operate, where one can define function/extension -local settings and have them revert after resolution that might be ok. That, imv, is really what should be happening inside the create extension script.. I'm not anxious to look into how to make that happen though. That said, while there isn't any way to prevent it the log_min GUCs should not be changed by extensions; that decision should be left up to the user. The complaint is not that it should be reset but that the installation script should not even care what the setting is. I agree that the script really shouldn't be changing it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Small bug on CREATE EXTENSION pgq...
On Wednesday, January 28, 2015, Stephen Frost sfr...@snowman.net wrote: * David G Johnston (david.g.johns...@gmail.com javascript:;) wrote: Jerry Sievers-3 wrote Hackers; I noticed this trying to import a large pg_dump file with warnings supressed. It seems loading pgq sets client_min_messages to warning and leaves it this way which defeats an attempt to change the setting prior and have it stick. I tested with several other extensions in same DB and only pgq has the problem. Sorry if this is known/fixed already. This is not part of PostgreSQL proper and thus not supported by -hackers; you should report this directly to the authors. Ehh.. Shouldn't we try to take a bit more care that we reset things after a CREATE EXTENSION is run? Not really sure how much effort we want to put into it, but I see a bit of blame on both sides here. Fair enough but reset to what? I don't know the internal mechanics but if the session default is warning and a local change sets it to notice then an unconditional reset would not get us back to the intended value. Now, if extensions can be handled similarly to how functions operate, where one can define function/extension -local settings and have them revert after resolution that might be ok. That said, while there isn't any way to prevent it the log_min GUCs should not be changed by extensions; that decision should be left up to the user. The complaint is not that it should be reset but that the installation script should not even care what the setting is. David J.
Re: [HACKERS] Possible typo in create_policy.sgml
* Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 7, 2015 at 3:06 PM, Stephen Frost sfr...@snowman.net wrote: If I'm following correctly, Peter's specifically talking about: [ USING ( replaceable class=parameterexpression/replaceable ) ] [ WITH CHECK ( replaceable class=parametercheck_expression/replaceable ) ] Where the USING parameter is 'expression' but the WITH CHECK parameter is 'check_expression'. He makes a good point, I believe, as expression is overly generic. I don't like the idea of using barrier_expression though as that ends up introducing a new term- how about using_expression? Oh. Well, I guess we could change that. I don't think it's a problem, myself. I thought he was talking about something in the SGML markup. I agree that it's not a big deal, but I agree with Peter that it's worthwhile to clarify, especially since this will be seen in psql's \h w/o the rest of the context of the CREATE POLICY documentation. I've gone ahead and made this minor change. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Possible typo in create_policy.sgml
Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: [There's also a typo further down -- filter out the records which are visible, should be not visible] I agree, that's not really worded quite right. I've reworded this along the lines of what you suggested (though not exactly- if you get a chance to review it, that'd be great) and pushed it, so we don't forget to do so later. What do you think of the attached rewording? Regarding the larger/earlier paragraph, Would be great if you could comment on my latest suggestion. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Small bug on CREATE EXTENSION pgq...
Stephen Frost sfr...@snowman.net writes: * David Johnston (david.g.johns...@gmail.com) wrote: Fair enough but reset to what? I don't know the internal mechanics but if the session default is warning and a local change sets it to notice then an unconditional reset would not get us back to the intended value. Yeah, we'd really want to reset it to what it was before. An extension script runs as a single transaction, so SET LOCAL could've been used to accomplish the result without trashing the session-lifespan setting. I'm not sure whether or not there was good reason to be changing the setting at all, but it's entirely on the extension script's head that it didn't do this in a less invasive way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: On 1/28/15 9:56 AM, Stephen Frost wrote: Such i/o systems do exist, but a single RAID5 group over spinning rust with a simple filter isn't going to cut it with a modern CPU- we're just too darn efficient to end up i/o bound in that case. A more complex filter might be able to change it over to being more CPU bound than i/o bound and produce the performance improvments you're looking for. Except we're nowhere near being IO efficient. The vast difference between Postgres IO rates and dd shows this. I suspect that's because we're not giving the OS a list of IO to perform while we're doing our thing, but that's just a guess. Uh, huh? The dd was ~321000 and the slowest uncached PG run from Robert's latest tests was 337312.554, based on my inbox history at least. I don't consider ~4-5% difference to be vast. The caveat to this is if you have multiple i/o *channels* (which it looks like you don't in this case) where you can parallelize across those channels by having multiple processes involved. Keep in mind that multiple processes is in no way a requirement for that. Async IO would do that, or even just requesting stuff from the OS before we need it. While I agree with this in principle, experience has shown that it doesn't tend to work out as well as we'd like with a single process. We only support multiple i/o channels today with tablespaces and we can't span tables across tablespaces. That's a problem when working with large data sets, but I'm hopeful that this work will eventually lead to a parallelized Append node that operates against a partitioned/inheirited table to work across multiple tablespaces. Until we can get a single seqscan close to dd performance, I fear worrying about tablespaces and IO channels is entirely premature. I feel like one of us is misunderstanding the numbers, which is probably in part because they're a bit piecemeal over email, but the seqscan speed in this case looks pretty close to dd performance for this particular test, when things are uncached. Cached numbers are different, but that's not what we're discussing here, I don't think. Don't get me wrong- I've definitely seen cases where we're CPU bound because of complex filters, etc, but that doesn't seem to be the case here. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal: two new role attributes and/or capabilities?
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: On 12/23/14 12:52 PM, Stephen Frost wrote: Autovacuum can certainly run vacuum/analyze on a few tables every 12 hours, so I'm not really following where you see autovacuum being unable to cope. I agree that there*are* such cases, but getting more information about those cases and exactly what solution*does* work would really help us improve autovacuum to address those use-cases. (going through some old email...) The two cases I've dealt with recently are: - Tables with a fair update/delete rate that should always stay small The problem with these tables is if anything happens to upset vacuuming you can end up with a significantly larger than expected table that's now essentially impossible to shrink. This could be caused by a single long-running transaction that happens to be in play when autovac kicks off, or for other reasons. Even once you manage to get all the tuples off the end of the heap it can still be extremely difficult to grab the lock you need to truncate it. Running a vacuum every minute from cron seems to help control this. Sadly, your indexes still get bloated, so occasionally you want to re-cluster too. The difference between the autovacuum-run vacuum and the cron-run vacuum is that the one running out of cron will just keep holding the lock until it's actually able to truncate the end of the relation, no? I recall discussion previously that we need a way to either support that in autovacuum for (a configurable set of) regular relations or come up with a solution that doesn't require that lock. - Preemptively vacuuming during off-hours Many sites have either nightly or weekend periods of reduced load. Such sites can gain a great benefit from scheduling preemptive vacuums to reduce the odds of disruptive vacuuming activity during heavy activity periods. This is especially true when it comes to a scan_all vacuum of a large table; having autovac do one of those at a peak period can really hose things. Having preferrable times for autovacuum to run vacuums would certainly be nice to support this use-case. All that said, I'm not against a role attribute which allows the user to vacuum/analyze anything. I do think that's a bit different from the existing effort to reduce the activities which require superuser as with the vacuum/analyze case you *could* have a single role that's a member of every role that owns the relations which you want to vacuum/analyze. I grant that it's a bit awkward though. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Small bug on CREATE EXTENSION pgq...
Hackers; I noticed this trying to import a large pg_dump file with warnings supressed. It seems loading pgq sets client_min_messages to warning and leaves it this way which defeats an attempt to change the setting prior and have it stick. I tested with several other extensions in same DB and only pgq has the problem. Sorry if this is known/fixed already. Thanks sj$ cat q select version(); create database foo template template0; \c foo show client_min_messages; create extension pgq; show client_min_messages; reset client_min_messages; show client_min_messages; create extension pgq_node; show client_min_messages; \c postgres drop database foo; sj$ /usr/local/postgresql-9.3/bin/psql -ef q --no-psqlrc select version(); version -- PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit (1 row) create database foo template template0; CREATE DATABASE psql (9.3.5, server 9.3.4) You are now connected to database foo as user jsievers. show client_min_messages; client_min_messages - notice (1 row) create extension pgq; CREATE EXTENSION show client_min_messages; client_min_messages - warning (1 row) reset client_min_messages; RESET show client_min_messages; client_min_messages - notice (1 row) create extension pgq_node; CREATE EXTENSION show client_min_messages; client_min_messages - notice (1 row) psql (9.3.5, server 9.3.4) You are now connected to database postgres as user jsievers. drop database foo; DROP DATABASE sj$ -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 312.241.7800 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small bug on CREATE EXTENSION pgq...
Jerry Sievers-3 wrote Hackers; I noticed this trying to import a large pg_dump file with warnings supressed. It seems loading pgq sets client_min_messages to warning and leaves it this way which defeats an attempt to change the setting prior and have it stick. I tested with several other extensions in same DB and only pgq has the problem. Sorry if this is known/fixed already. This is not part of PostgreSQL proper and thus not supported by -hackers; you should report this directly to the authors. David J. -- View this message in context: http://postgresql.nabble.com/Small-bug-on-CREATE-EXTENSION-pgq-tp5835899p5835900.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
Peter, * Peter Geoghegan (p...@heroku.com) wrote: I also don't see this behavior documented (this is from process_policies()): [...] But is that really the right place for it? Does it not equally well apply to FOR UPDATE policies, that can on their own have both barriers quals and WITH CHECK options? Basically, that seems to me like a *generic* property of policies (it's a generic thing that the WITH CHECK options/expressions shadow the USING security barrier quals as check options), and so should be documented as such. Thanks, you're right, the documentation there can be improved. I've pushed up a change to clarify that the USING quals will be used for the WITH CHECK case if no WITH CHECK quals are specified. Hopefully that's clear now, but please let me know if you feel further improvements to this would help. I do think we will be making additional changes in this area before too long, but good to get it cleaned up now anyway, so we don't forget to do so later. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Small bug on CREATE EXTENSION pgq...
* David G Johnston (david.g.johns...@gmail.com) wrote: Jerry Sievers-3 wrote Hackers; I noticed this trying to import a large pg_dump file with warnings supressed. It seems loading pgq sets client_min_messages to warning and leaves it this way which defeats an attempt to change the setting prior and have it stick. I tested with several other extensions in same DB and only pgq has the problem. Sorry if this is known/fixed already. This is not part of PostgreSQL proper and thus not supported by -hackers; you should report this directly to the authors. Ehh.. Shouldn't we try to take a bit more care that we reset things after a CREATE EXTENSION is run? Not really sure how much effort we want to put into it, but I see a bit of blame on both sides here. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 8:27 PM, Stephen Frost sfr...@snowman.net wrote: I feel like one of us is misunderstanding the numbers, which is probably in part because they're a bit piecemeal over email, but the seqscan speed in this case looks pretty close to dd performance for this particular test, when things are uncached. Cached numbers are different, but that's not what we're discussing here, I don't think. Don't get me wrong- I've definitely seen cases where we're CPU bound because of complex filters, etc, but that doesn't seem to be the case here. To try to clarify a bit: What we've testing here is a function I wrote called parallel_count(regclass), which counts all the visible tuples in a named relation. That runs as fast as dd, and giving it extra workers or prefetching or the ability to read the relation with different I/O patterns never seems to speed anything up very much. The story with parallel sequential scan itself may well be different, since that has a lot more CPU overhead than a dumb-simple tuple-counter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing updates at few places for row level security
Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: There is a new column added in pg_authid (rolbypassrls) and the updation for same is missed in below places: a. System catalog page for pg_authid http://www.postgresql.org/docs/devel/static/catalog-pg-authid.html b. Do we want to add this new column in pg_shadow view? Thanks for this. Pushed with a few additional changes (included it in both pg_user and pg_shadow, few other doc updates, etc). Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade and rsync
On 01/28/2015 02:28 PM, Josh Berkus wrote: On 01/28/2015 02:10 PM, Josh Berkus wrote: So 390MB were transferred out of a possible 474MB. That certainly seems like we're still transferring the majority of the data, even though I verified that the hard links are being sent as hard links. No? Looks like the majority of that was pg_xlog. Going to tear this down and start over, and --exclude pg_xlog. So, having redone this without the pg_xlog lag, this appears to work in terms of cutting down the rsync volume. I'm concerned about putting this in the main docs, though. This is a complex, and fragile procedure, which is very easy to get wrong, and hard to explain for a generic case. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Josh Berkus (j...@agliodbs.com) wrote: On 01/28/2015 02:28 PM, Josh Berkus wrote: On 01/28/2015 02:10 PM, Josh Berkus wrote: So 390MB were transferred out of a possible 474MB. That certainly seems like we're still transferring the majority of the data, even though I verified that the hard links are being sent as hard links. No? Looks like the majority of that was pg_xlog. Going to tear this down and start over, and --exclude pg_xlog. So, having redone this without the pg_xlog lag, this appears to work in terms of cutting down the rsync volume. I'm concerned about putting this in the main docs, though. This is a complex, and fragile procedure, which is very easy to get wrong, and hard to explain for a generic case. So, for my 2c, I'm on the fence about it. On the one hand, I agree, it's a bit of a complex process to get right. On the other hand, it's far better if we put something out there along the lines of if you really want to, this is how to do it than having folks try to fumble through to find the correct steps themselves. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Exposing the stats snapshot timestamp to SQL
In a previous thread Tom Lane said: (I'm also wondering if it'd make sense to expose the stats timestamp as a callable function, so that the case could be dealt with programmatically as well. But that's future-feature territory.) (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us) It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it. Main purpose of this patch is to expose the timestamp of the current stats snapshot so that it can be leveraged by monitoring code. The obvious case is alerting if the stats collector becomes unresponsive. The common use case is to smooth out graphs which are built from high frequency monitoring of the stats collector. The timestamp is already available as part of PgStat_GlobalStats. This patch is just the boilerplate (+docs tests) needed to expose that value to SQL. It shouldn't impact anything else in the server. I'm not particularly attached to the function name, but I didn't have a better idea. The patch should apply cleanly to master. - Matt K pg_stat_snapshot_timestamp_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
So, for my 2c, I'm on the fence about it. On the one hand, I agree, it's a bit of a complex process to get right. On the other hand, it's far better if we put something out there along the lines of if you really want to, this is how to do it than having folks try to fumble through to find the correct steps themselves. So, here's the correct steps for Bruce, because his current doc does not cover all of these. I really think this should go in as a numbered set of steps; the current doc has some steps as steps, and other stuff buried in paragraphs. 1. Install the new version binaries on both servers, alongside the old version. 2. If not done by the package install, initdb the new version's data directory. 3. Check that the replica is not very lagged. If it is, wait for traffic to die down and for it to catch up. 4. Shut down the master using -m fast or -m smart for a clean shutdown. It is not necessary to shut down the replicas yet. 5. pg_upgrade the master using the --link option. Do not start the new version yet. 6. create a data directory for the new version on the replica. This directory should be empty; if it was initdb'd by the installation package, then delete its contents. 7. shut down postgres on the replica. 8. rsync both the old and new data directories from the master to the replica, using the --size-only and -H hard links options. For example, if both 9.3 and 9.4 are in /var/lib/postgresql, do: rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/ replica-host:/var/lib/postgresql/ 9. Create a recovery.conf file in the replica's data directory with the appropriate parameters. 10. Start the master, then the replica -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown t...@linux.com wrote: On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote: The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. Can't the planner take effective_io_concurrency into account? Maybe. It's answering a somewhat the right question -- to tell us how many parallel I/O channels we think we've got. But I'm not quite sure what the to do with that information in this case. I mean, if we've got effective_io_concurrency = 6, does that mean it's right to start scans in 6 arbitrary places in the relation and hope that keeps all the drives busy? That seems like throwing darts at the wall. We have no idea which parts are on which underlying devices. Or maybe it mean we should prefetch 24MB, on the assumption that the RAID stripe is 4MB? That's definitely blind guesswork. Considering the email Amit just sent, it looks like on this machine, regardless of what algorithm we used, the scan took between 3 minutes and 5.5 minutes, and most of them took between 4 minutes and 5.5 minutes. The results aren't very predictable, more workers don't necessarily help, and it's not really clear that any algorithm we've tried is clearly better than any other. I experimented with prefetching a bit yesterday, too, and it was pretty much the same. Some settings made it slightly faster. Others made it slower. Whee! I have been researching this topic long time ago. One notably fact is that active prefetching disables automatic readahead prefetching (by Linux kernel), which can occour in larger granularities than 8K. Automatic readahead prefetching occours when consecutive addresses are read, which may happen by a seqscan but also by accident through an indexscan in correlated cases. My consequence was to NOT prefetch seqscans, because OS does good enough without advice. Prefetching indexscan heap accesses is very valuable though, but you need to detect the accidential sequential accesses to not hurt your performance in correlated cases. In general I can give you the hint to not only focus on HDDs with their single spindle. A single SATA SSD scales up to 32 (31 on Linux) requests in parallel (without RAID or anything else). The difference in throughput is extreme for this type of storage device. While single spinning HDDs can only gain up to ~20% by NCQ, SATA SSDs can easily gain up to 700%. +1 for using effective_io_concurrency to tune for this, since prefetching random addresses is effectively a type of parallel I/O. Regards, Daniel -- MSc. Daniel Bausch Research Assistant (Computer Science) Technische Universität Darmstadt http://www.dvs.tu-darmstadt.de/staff/dbausch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On Wed, Jan 28, 2015 at 1:13 PM, Jim Nasby jim.na...@bluetreble.com wrote: My $0.01: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. If no one has bitched about this with text, is it really that big a problem with JSON? I would bitch about it for text if I thought that it would do any good.
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Hi I am testing this feature on relative complex schema (38619 tables in db) and I got deadlock [pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4 vacuumdb: vacuuming database test2 vacuumdb: vacuuming of database test2 failed: ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. HINT: See server log for query details. ERROR: deadlock detected DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690. Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689. Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; HINT: See server log for query details. STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class; ERROR: canceling statement due to user request STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; LOG: could not send data to client: Broken pipe STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc; FATAL: connection to client lost LOG: could not send data to client: Broken pipe ERROR: canceling statement due to user request FATAL: connection to client lost Schema | Name | Type | Owner |Size| Description +-+---+--++- pg_catalog | pg_attribute| table | postgres | 439 MB | pg_catalog | pg_rewrite | table | postgres | 314 MB | pg_catalog | pg_proc | table | postgres | 136 MB | pg_catalog | pg_depend | table | postgres | 133 MB | pg_catalog | pg_class| table | postgres | 69 MB | pg_catalog | pg_attrdef | table | postgres | 55 MB | pg_catalog | pg_trigger | table | postgres | 47 MB | pg_catalog | pg_type | table | postgres | 31 MB | pg_catalog | pg_description | table | postgres | 23 MB | pg_catalog | pg_index| table | postgres | 20 MB | pg_catalog | pg_constraint | table | postgres | 17 MB | pg_catalog | pg_shdepend | table | postgres | 17 MB | pg_catalog | pg_statistic| table | postgres | 928 kB | pg_catalog | pg_operator | table | postgres | 552 kB | pg_catalog | pg_collation| table | postgres | 232 kB | Regards Pavel Stehule 2015-01-27 3:26 GMT+01:00 Dilip kumar dilip.ku...@huawei.com: On 23 January 2015 21:10, Alvaro Herrera Wrote, In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) Thanks for your effort, I will take it up for next commitfest.. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On Wed, Jan 28, 2015 at 10:13 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's a patch to implement the above scheme. It adds four functions to libpq, to interrogate the SSL status: int PQsslInUse(const PGconn *conn) Returns true (1) if the connection uses SSL, false (0) if not. const char *PQsslAttribute(const PGconn *conn, const char *attribute_name) Returns a piece of information. The list of attributes depends on the implementation, but there are a few that are expected to be supported by all of them. See docs for details. const char **PQsslAttributes(const PGconn *conn); Return an array of SSL attribute names available. void *PQsslStruct(const PGconn *conn, const char *struct_name) Return a pointer to an SSL-implementation specific object describing the connection. PQsslStruct(conn, OpenSSL SSL) is equivalent to PQgetssl(conn). I think this is expandable enough, because you can easily add attributes later on, and different implementations can support different attributes. It contains the escape hatch for applications that need to do more, and have intimate knowledge of OpenSSL structs. It's also pretty easy to use. I like it! Although I think OpenSSL SSL is a little bit duplicatively redundant. Why not just OpenSSL? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 28/01/15 18:09, Heikki Linnakangas wrote: On 01/23/2015 02:34 AM, Petr Jelinek wrote: On 22/01/15 17:02, Petr Jelinek wrote: The new version (the one that is not submitted yet) of gapless sequence is way more ugly and probably not best example either but does guarantee gaplessness (it stores the last value in it's own value table). So I am not sure if it should be included either... Here it is as promised. I generally like the division of labour between the seqam implementations and the API now. I don't like the default_sequenceam GUC. That seems likely to create confusion. And it's not really enough for something like a remote sequence AM anyway that definitely needs some extra reloptions, like the hostname of the remote server. The default should be 'local', unless you specify something else with CREATE SEQUENCE USING - no GUCs. Hmm, I am not too happy about this, I want SERIAL to work with custom sequences (as long as it's possible for the sequence to work that way at least). That's actually important feature for me. Your argument about this being potential problem for some sequenceAMs is valid though. Some comments on pg_dump: * In pg_dump's dumpSequence function, check the server version number instead of checking whether pg_sequence_dump_state() function exists. That's what we usually do when new features are introduced. And there's actually a bug there: you have the check backwards. (try dumping a database with any sequences in it; it fails) Right. * pg_dump should not output a USING clause when a sequence uses the 'local' sequence. No point in adding such noise - making the SQL command not standard-compatible - for the 99% of databases that don't use other sequence AMs. Well this only works if we remove the GUC. Because otherwise if GUC is there then you always need to either add USING clause or set the GUC in advance (like we do with search_path for example). * Be careful to escape all strings correctly in pg_dump. I think you're missing escaping for the 'state' variable at least. Ouch. In sequence_save_tuple: else { /* * New tuple was not sent, so the original tuple was probably just * changed inline, all we need to do is mark the buffer dirty and * optionally log the update tuple. */ START_CRIT_SECTION(); MarkBufferDirty(seqh-buf); if (do_wal) log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, page); END_CRIT_SECTION(); } This is wrong when checksums are enabled and !do_wal. I believe this should use MarkBufferDirtyHint(). Oh ok, didn't realize. Notable changes: - The gapless sequence rewritten to use the transactional storage as that's the only way to guarantee gaplessness between dump and restore. Can you elaborate? Using the auxiliary seqam_gapless_values is a bit problematic. First of all, the event trigger on DROP SEQUENCE doesn't fire if you drop the whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, Yeah but at worst there are some unused rows there, it's not too bad. I could also create relation per sequence so that dependency code would handle everything correctly, but seems bit too expensive to create not one but two relations per sequence... updating a row in a table for every nextval() call is pretty darn expensive. Yes it's expensive, but the gapless sequence also serializes access so it will never be speed daemon. But I don't actually see a problem with dump and restore. The problem is that the tuple stored in the sequence relation is always the one with latest state (and with frozen xid), so pg_dump has no way of getting the value as it was at the time it took the snapshot. This means that after dump/restore cycle, the sequence can be further away than the table it's attached to and you end up with a gap there. Can you rewrite it without using the values table? AFAICS, there are a two of problems to solve: 1. If the top-level transaction aborts, you need to restore the old value. I suggest keeping two values in the sequence tuple: the old, definitely committed one, and the last value. The last value is only considered current if the associated XID committed; otherwise the old value is current. When a transaction is about to commit, it stores its top-level XID and the new value in the last field, and copies the previously current value to the old field. Yes, this is how the previous implementation worked. 2. You need to track the last value on a per-subtransaction basis, until the transaction commits/rolls back, in order to have rollback to savepoint to retreat the sequence's value. That can be done in backend-private memory, maintaining a stack of subtransactions and the last value of each. Use RegisterSubXactCallback to hook into subxact commit/abort. Just before top-level commit (in pre-commit callback), update the sequence tuple with the latest value, so that it gets WAL-logged
Re: [HACKERS] PQgetssl() and alternative SSL implementations
Heikki Linnakangas hlinnakan...@vmware.com writes: Right, that was the idea. I wanted it to include the word OpenSSL, to make it clear in the callers that it's specific to OpenSSL. And SSL, because that's the name of the struct. I agree it looks silly, though. One idea is to have two separate arguments: the implementation name, and the struct name. PQgetSSLstruct(ssl, OpenSSL, SSL) would look less silly. That's probably overkill. Why not establish a convention that the main API struct for the library doesn't have to be named? So it's just PQgetSSLstruct(ssl, OpenSSL), and you only need strange naming if you're dealing with a library that actually has more than one API object that needs to be fetched this way. (That set is likely empty...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
David G Johnston david.g.johns...@gmail.com writes: Am I missing something or has there been no consideration in this forbid plan on whether users will be able to retrieve, even if partially incorrectly, any jsonb data that has already been stored? Data that's already been stored would look like the six characters \u, which indeed might have been what it was anyway, since part of the problem here is that we fail to distinguish \\u from \u. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
Merlin Moncure mmonc...@gmail.com writes: ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an assertion every single time: TRAP: FailedAssertion(!(flags 0x0010), File: dynahash.c, Line: 330) There's no Assert at line 330 in 9.4, though there is in HEAD. I suspect what you've got here is a version mismatch; in particular commit 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c such that external modules would need to be recompiled to use it without error. I'm not real sure though how you are getting past the loadable-module version check. Anyway, I'd try make distclean and full rebuild before anything else. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-28 13:38:51 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I personally still think that a comment above sbufdesc's definition would be sufficient for now. But whatever. I'll enforce 64byte padding on 64bit platforms, and do nothing on 32bit platforms. Patch doing that attached. Surely the sizeof() in BufferShmemSize needs to be sizeof(BufferDescPadded) now? Good catch. It'd surely fail nastily later, once somebody actually made the size change away from 64 bytes. Also, the macro is unnecessarily unsafe for use in #if tests; why not something like Out of curiosity: What's the danger you're seing? I'm perfectly fine with using SIZEOF_VOID_P, I actually looked for a macro like it and didn't find anything. #define BUFFERDESC_PADDED_SIZE(SIZEOF_VOID_P == 8 ? 64 : 32) Hm, did you intentionally put a 32in there or was that just the logical continuation of 64? Because there's no way it'll ever fit into 32 bytes in the near future. That's why I had put the sizeof(BufferDesc) there. We could just make it 1 as well, to indicate that we don't want any padding... Otherwise it looks reasonably sane to me, just in a quick once-over; I didn't test. I tested it on both 32/64 bit linux and it indeed fixes the issue of changing performance due to max_connections+-1. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Tom Lane-2 wrote Andrew Dunstan lt; andrew@ gt; writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the representation. The only other plausible answer seems to be to flat out reject \u. But I assume nobody likes that. I don't think we can be in the business of rejecting valid JSON. Actually, after studying the code a bit, I wonder if we wouldn't be best off to do exactly that, at least for 9.4.x. At minimum we're talking about an API change for JsonSemAction functions (which currently get the already-de-escaped string as a C string; not gonna work for embedded nulls). I'm not sure if there are already third-party extensions using that API, but it seems possible, in which case changing it in a minor release wouldn't be nice. Even ignoring that risk, making sure we'd fixed everything seems like more than a day's work, which is as much as I for one could spare before 9.4.1. So at this point I propose that we reject \u when de-escaping JSON. Anybody who's seriously unhappy with that can propose a patch to fix it properly in 9.5 or later. The hybrid option is to reject the values for 9.4.1 but then commit to removing that hack and fixing this properly in 9.4.2; we can always call that release 9.5... I think the it would mean rejecting valid JSON argument is utter hogwash. We already reject, eg, \u00A0 if you're not using a UTF8 encoding. And we reject 1e1, not because that's invalid JSON but because of an implementation restriction of our underlying numeric type. I don't see any moral superiority of that over rejecting \u because of an implementation restriction of our underlying text type. Am I missing something or has there been no consideration in this forbid plan on whether users will be able to retrieve, even if partially incorrectly, any jsonb data that has already been stored? If we mangled their data on input we should at least return the data and provide them a chance to manually (or automatically depending on their data) fix our mistake. Given we already disallow NUL in text ISTM that allowing said data in other text-like areas is asking for just the kind of trouble we are seeing here. I'm OK with the proposition that those wishing to utilize NUL are relegated to working with bytea. From the commit Tom references down-thread: However, this led to some perverse results in the case of Unicode sequences. Given that said results are not documented in the commit its hard to judge whether a complete revert is being overly broad... Anyway, just some observations since I'm not currently a user of JSON. Tom's arguments and counter-arguments ring true to me in the general sense. The DBA staying on 9.4.0 because of this change probably just needs to be told to go back to using json and then run the update. Their data has issues even they stay on 9.4.0 with the more accepting version of jsonb. David J. -- View this message in context: http://postgresql.nabble.com/jsonb-unicode-escapes-and-escaped-backslashes-tp5834962p5835824.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-28 17:08:46 +0100, Andres Freund wrote: I just have no idea whether it'd be beneficial to use more space on 32bit to pad the individual entries. Since this mostly is beneficial on multi-socket, highly concurrent workloads, I doubt it really matter. I personally still think that a comment above sbufdesc's definition would be sufficient for now. But whatever. I'll enforce 64byte padding on 64bit platforms, and do nothing on 32bit platforms. Patch doing that attached. I've for now dealt with the 32bit issue by: #define BUFFERDESC_PADDED_SIZE (sizeof(void*) == 8 ? 64 : sizeof(BufferDesc)) and * XXX: As this is primarily matters in highly concurrent workloads which * probably all are 64bit these days, and the space wastage would be a bit * more noticeable on 32bit systems, we don't force the stride to be cache * line sized. If somebody does actual performance testing, we can reevaluate. I've replaced direct accesses to the (Local)BufferDescriptors array by a wrapper macros to hide the union indirect and allow potential future changes to be made more easily. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From caeb98e4ab48748556f4983fdaa52098a190aa57 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 28 Jan 2015 18:51:50 +0100 Subject: [PATCH 1/3] Fix #ifdefed'ed out code to compile again. --- src/backend/storage/buffer/bufmgr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 7eb2d22..f92d0ef 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -2764,7 +2764,7 @@ PrintPinnedBufs(void) [%02d] (freeNext=%d, rel=%s, blockNum=%u, flags=0x%x, refcount=%u %d), i, buf-freeNext, - relpath(buf-tag.rnode, buf-tag.forkNum), + relpathperm(buf-tag.rnode, buf-tag.forkNum), buf-tag.blockNum, buf-flags, buf-refcount, GetPrivateRefCount(i + 1)); } -- 2.0.0.rc2.4.g1dc51c6.dirty From 2bb5f85d8c46cea30b3d1d80b781574c2f2a3903 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 28 Jan 2015 13:55:36 +0100 Subject: [PATCH 2/3] Align buffer descriptors to cache line boundaries. Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability, especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results, after minor configuration changes. In addition to aligning the start of array array, also force the size of individual descriptors to be of common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the definition more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.gd32...@awork2.anarazel.de Per discussion with Bruce Momjan, Robert Haas, Tom Lane and Peter Geoghegan. --- contrib/pg_buffercache/pg_buffercache_pages.c | 6 ++- src/backend/storage/buffer/buf_init.c | 19 src/backend/storage/buffer/bufmgr.c | 66 ++- src/backend/storage/buffer/freelist.c | 6 +-- src/backend/storage/buffer/localbuf.c | 12 ++--- src/include/c.h | 1 + src/include/storage/buf_internals.h | 34 +- 7 files changed, 91 insertions(+), 53 deletions(-) diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c index d00f985..98016fc 100644 --- a/contrib/pg_buffercache/pg_buffercache_pages.c +++ b/contrib/pg_buffercache/pg_buffercache_pages.c @@ -73,7 +73,6 @@ pg_buffercache_pages(PG_FUNCTION_ARGS) if (SRF_IS_FIRSTCALL()) { int i; - volatile BufferDesc *bufHdr; funcctx = SRF_FIRSTCALL_INIT(); @@ -146,8 +145,11 @@ pg_buffercache_pages(PG_FUNCTION_ARGS) * Scan though all the buffers, saving the relevant fields in the * fctx-record structure. */ - for (i = 0, bufHdr = BufferDescriptors; i NBuffers; i++, bufHdr++) + for (i = 0; i NBuffers; i++) { + volatile BufferDesc *bufHdr; + + bufHdr = GetBufferDescriptor(i); /* Lock each buffer header before inspecting. */ LockBufHdr(bufHdr); diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 4434bc3..c013e10 100644 --- a/src/backend/storage/buffer/buf_init.c +++
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
Andres Freund and...@2ndquadrant.com writes: I personally still think that a comment above sbufdesc's definition would be sufficient for now. But whatever. I'll enforce 64byte padding on 64bit platforms, and do nothing on 32bit platforms. Patch doing that attached. Surely the sizeof() in BufferShmemSize needs to be sizeof(BufferDescPadded) now? Also, the macro is unnecessarily unsafe for use in #if tests; why not something like #define BUFFERDESC_PADDED_SIZE (SIZEOF_VOID_P == 8 ? 64 : 32) Otherwise it looks reasonably sane to me, just in a quick once-over; I didn't test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I still haven't categorically ruled out pl/sh yet; that's something to keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the problem went away. So again, it looks like this might be a lot of false alarm. A pl/sh driven failure might still be interesting if it's coming from the internal calls it's making, so I'm still chasing it down. ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an assertion every single time: mmoncure@mernix2 12:25 PM (REL9_4_STABLE) ~/src/p94$ cat /mnt/ssd/data/pg_log/postgresql-28.log [ 12287 2015-01-28 12:24:24.080 CST 0]LOG: received smart shutdown request [ 13516 2015-01-28 12:24:24.080 CST 0]LOG: autovacuum launcher shutting down [ 13513 2015-01-28 12:24:24.081 CST 0]LOG: shutting down [ 13513 2015-01-28 12:24:24.083 CST 0]LOG: database system is shut down [ 14481 2015-01-28 12:24:25.127 CST 0]LOG: database system was shut down at 2015-01-28 12:24:24 CST [ 14457 2015-01-28 12:24:25.129 CST 0]LOG: database system is ready to accept connections [ 14485 2015-01-28 12:24:25.129 CST 0]LOG: autovacuum launcher started TRAP: FailedAssertion(!(flags 0x0010), File: dynahash.c, Line: 330) [ 14457 2015-01-28 12:24:47.983 CST 0]LOG: server process (PID 14545) was terminated by signal 6: Aborted [ 14457 2015-01-28 12:24:47.983 CST 0]DETAIL: Failed process was running: SELECT CDSStartRun() [ 14457 2015-01-28 12:24:47.983 CST 0]LOG: terminating any other active server processes [cds2 14546 2015-01-28 12:24:47.983 CST 0]WARNING: terminating connection because of crash of another server process [cds2 14546 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. [cds2 14546 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you should be able to reconnect to the database and repeat your command. [ 14485 2015-01-28 12:24:47.983 CST 0]WARNING: terminating connection because of crash of another server process [ 14485 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. [ 14485 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you should be able to reconnect to the database and repeat your command. [ 14457 2015-01-28 12:24:47.984 CST 0]LOG: all server processes terminated; reinitializing [ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was interrupted; last known up at 2015-01-28 12:24:25 CST [ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was not properly shut down; automatic recovery in progress [ 14554 2015-01-28 12:24:47.997 CST 0]LOG: invalid magic number in log segment 00010001, offset 13000704 [ 14554 2015-01-28 12:24:47.997 CST 0]LOG: redo is not required [ 14457 2015-01-28 12:24:48.000 CST 0]LOG: database system is ready to accept connections [ 14558 2015-01-28 12:24:48.000 CST 0]LOG: autovacuum launcher started merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with both --serializable-deferrable and -j
Alexander Korotkov aekorot...@gmail.com wrote: when pg_dump is run with both --serializable-deferrable and -j options to pg_dump, it returns errors: pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [parallel archiver] query was: SET TRANSACTION SNAPSHOT '0001E300-1' pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE I've checked it on 9.4.0 and 9.3.5. So, these options are incompatible now. The only way to make them compatible would be to have some way for an exported snapshot to indicate that it is safe against serialization anomalies if used for a serializable transaction, and to only generate an error if a non-safe snapshot is used with the --serializable-deferrable option. Could we start snapshot-importing transaction with repeatable read isolation level? You can if you don't use the option which specifies that you want serializable behavior. Why specify --serializable-deferrable if you don't? AFAICS, they should read exactly same data as snapshot-exporting serializable transaction. Sort of. The behavior once they have a snapshot and are running is the same; the difference is whether the snapshot can see a transient state which would not be consistent with some serial order of transaction execution. For examples, see: https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions It's really a question of why you want the dump. If it is for crash recovery, you don't need --serializable-deferrable. If you want to restore it to run reports that only show states consistent with some serial execution of serializable transactions you *do* need --serializable-deferrable, so that the snapshot used for the dump reflects a point in the commit stream when no serialization anomalies can be visible to a read-only transaction. If not, could pg_dump return some more friendly error before actually trying to dump? Sure, until such time as it is possible to request a serializable-safe snapshot and flag it as such on export (and check for that in pg_dump), we could add a validation that these two options are incompatible. If there are no objections, I can push something to do that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On 08/20/2014 12:58 AM, Heikki Linnakangas wrote: On 08/19/2014 10:31 PM, Robert Haas wrote: On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert's got a point though: there is always going to be somebody who wants something we fail to expose. It's better to be able to say well, you can do PQgetssl and then munge it for yourself than to have to say sorry, you're screwed. So if we're going to define PQgetssl as returning NULL when you're not using OpenSSL, I don't see why we shouldn't expose a similarly-defined PQgetXXX for each other underlying implementation we support. There will not be that many of 'em, and I suspect the people with very specific needs will not care about more than one underlying library anyway. This does not say that we shouldn't also try to have some library-independent functionality for interrogating certificate state etc. Just that having an escape hatch isn't a bad thing. Yeah, wouldn't hurt I guess. I do agree tha thaving both would be useful. We could have something like int PQgetSSLstruct(void **sslstruct) I think it's likely smarter to have totally separate functions. First, to make it less likely that users will try to use a pointer to one type of object as a pointer to some other kind of object. And second, because you might, for example, someday have an SSL implementation that wants to return two pointers. May as well make that kind of thing easy. The struct it returns is totally SSL-implementation specific anyway, so for an implementation that would like to return two structs, you could well define it to return a struct like: struct { CoolStructA *a; CoolStructB *b; } CoolSSLStruct; I don't much like adding a separate function for every SSL implementation, but you've got a point that it would be nice to make it difficult to call PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL struct, while it's actually something else. Perhaps: int PQgetSSLstruct(void **sslstruct, char *structname) You'd call it like PQgetSSLStruct(mystruct, openssl), and it checks that the argument matches the library actually been used, otherwise it returns an error. And if you need to return two structs, you'd call it twice: PQgetSSLStruct(a, cool_a) and PQgetSSLStruct(b, cool_b). Here's a patch to implement the above scheme. It adds four functions to libpq, to interrogate the SSL status: int PQsslInUse(const PGconn *conn) Returns true (1) if the connection uses SSL, false (0) if not. const char *PQsslAttribute(const PGconn *conn, const char *attribute_name) Returns a piece of information. The list of attributes depends on the implementation, but there are a few that are expected to be supported by all of them. See docs for details. const char **PQsslAttributes(const PGconn *conn); Return an array of SSL attribute names available. void *PQsslStruct(const PGconn *conn, const char *struct_name) Return a pointer to an SSL-implementation specific object describing the connection. PQsslStruct(conn, OpenSSL SSL) is equivalent to PQgetssl(conn). I think this is expandable enough, because you can easily add attributes later on, and different implementations can support different attributes. It contains the escape hatch for applications that need to do more, and have intimate knowledge of OpenSSL structs. It's also pretty easy to use. Thoughts? - Heikki From 0e0f2ceb86cf99611c00e57efdbc84615347542e Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 27 Nov 2014 20:00:36 +0200 Subject: [PATCH 1/1] Add API functions to libpq to interrogate SSL related stuff. This makes it possible to query for things like the SSL version and cipher used, without depending on OpenSSL functions or macros. That is a good thing if we ever get another SSL implementation. --- doc/src/sgml/libpq.sgml | 154 +++ src/bin/psql/command.c | 35 +++ src/interfaces/libpq/exports.txt | 4 + src/interfaces/libpq/fe-secure-openssl.c | 68 ++ src/interfaces/libpq/fe-secure.c | 20 src/interfaces/libpq/libpq-fe.h | 6 ++ 6 files changed, 250 insertions(+), 37 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index de272c5..b96686a 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1848,33 +1848,130 @@ int PQconnectionUsedPassword(const PGconn *conn); /para /listitem /varlistentry + /variablelist + /para -varlistentry id=libpq-pqgetssl - termfunctionPQgetssl/functionindextermprimaryPQgetssl///term + para +The following functions return information related to SSL. This information +usually doesn't change after a connection is established. + +variablelist +varlistentry id=libpq-pqsslinuse +
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown t...@linux.com wrote: On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote: The problem here, as I see it, is that we're flying blind. If there's just one spindle, I think it's got to be right to read the relation sequentially. But if there are multiple spindles, it might not be, but it seems hard to predict what we should do. We don't know what the RAID chunk size is or how many spindles there are, so any guess as to how to chunk up the relation and divide up the work between workers is just a shot in the dark. Can't the planner take effective_io_concurrency into account? Maybe. It's answering a somewhat the right question -- to tell us how many parallel I/O channels we think we've got. But I'm not quite sure what the to do with that information in this case. I mean, if we've got effective_io_concurrency = 6, does that mean it's right to start scans in 6 arbitrary places in the relation and hope that keeps all the drives busy? That seems like throwing darts at the wall. We have no idea which parts are on which underlying devices. Or maybe it mean we should prefetch 24MB, on the assumption that the RAID stripe is 4MB? That's definitely blind guesswork. Considering the email Amit just sent, it looks like on this machine, regardless of what algorithm we used, the scan took between 3 minutes and 5.5 minutes, and most of them took between 4 minutes and 5.5 minutes. The results aren't very predictable, more workers don't necessarily help, and it's not really clear that any algorithm we've tried is clearly better than any other. I experimented with prefetching a bit yesterday, too, and it was pretty much the same. Some settings made it slightly faster. Others made it slower. Whee! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an assertion every single time: TRAP: FailedAssertion(!(flags 0x0010), File: dynahash.c, Line: 330) There's no Assert at line 330 in 9.4, though there is in HEAD. I suspect what you've got here is a version mismatch; in particular commit 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c such that external modules would need to be recompiled to use it without error. I'm not real sure though how you are getting past the loadable-module version check. Anyway, I'd try make distclean and full rebuild before anything else. you're right -- git got confused and built a 9.5 devel at some random commit. I reran it again on 9.4 and it worked ok :(. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
Andres Freund and...@2ndquadrant.com writes: On 2015-01-28 13:38:51 -0500, Tom Lane wrote: #define BUFFERDESC_PADDED_SIZE (SIZEOF_VOID_P == 8 ? 64 : 32) Hm, did you intentionally put a 32in there or was that just the logical continuation of 64? Because there's no way it'll ever fit into 32 bytes in the near future. That's why I had put the sizeof(BufferDesc) there. We could just make it 1 as well, to indicate that we don't want any padding... Yeah, 1 would be fine too. Maybe better to call it BUFFERDESC_MIN_SIZE, because as this stands it's enforcing a min size not exact size. (I'm not going to whinge about that aspect of it; the main point here is to put in the union and fix the ensuing notational fallout. We can worry about exactly what size to pad to as a separate discussion.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Jan 28, 2015 at 4:46 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Wed, Jan 28, 2015 at 9:47 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/27/15 1:04 AM, Haribabu Kommi wrote: Here I attached the latest version of the patch. I will add this patch to the next commitfest. Apologies if this was covered, but why isn't the IP address an inet instead of text? Corrected the address type as inet instead of text. updated patch is attached. Also, what happens if someone reloads the config in the middle of running the SRF? hba entries are reloaded only in postmaster process, not in every backend. So there shouldn't be any problem with config file reload. Am i missing something? I don't have any comment to disagree with the patch (idea and code). But I'm thinking about this patch and would not be interesting to have a FDW to manipulate the hba file? Imagine if we are able to manipulate the HBA file using INSERT/UPDATE/DELETE. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= fabriziome...@gmail.com writes: But I'm thinking about this patch and would not be interesting to have a FDW to manipulate the hba file? Imagine if we are able to manipulate the HBA file using INSERT/UPDATE/DELETE. Since the HBA file is fundamentally order-dependent, while SQL tables are fundamentally not, that doesn't seem like a great API match. You could probably brute-force something that would work, but it would very much be a case of using a hammer to solve a screwdriver problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] compiler warnings in copy.c
My compiler is unhappy with the latest changes to copy.c: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Wall -Werror -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o copy.o copy.c -MMD -MP -MF .deps/copy.Po copy.c: In function ‘DoCopy’: copy.c:924:30: error: ‘rte’ may be used uninitialized in this function [-Werror=uninitialized] copy.c:793:17: note: ‘rte’ was declared here cc1: all warnings being treated as errors From what I can see, this is a pretty legitimate complaint. If stmt-relation == NULL, then rte never gets initialized, but we still do cstate-range_table = list_make1(rte). That can't be good. Also, you seem to have pushed these commits with a date more than two weeks in the past. Please don't do that! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compiler warnings in copy.c
* Robert Haas (robertmh...@gmail.com) wrote: My compiler is unhappy with the latest changes to copy.c: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Wall -Werror -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o copy.o copy.c -MMD -MP -MF .deps/copy.Po copy.c: In function ‘DoCopy’: copy.c:924:30: error: ‘rte’ may be used uninitialized in this function [-Werror=uninitialized] copy.c:793:17: note: ‘rte’ was declared here cc1: all warnings being treated as errors Huh, interesting that mine didn't. From what I can see, this is a pretty legitimate complaint. If stmt-relation == NULL, then rte never gets initialized, but we still do cstate-range_table = list_make1(rte). That can't be good. Yeah, I'll fix that. Also, you seem to have pushed these commits with a date more than two weeks in the past. Please don't do that! Oh, wow, sorry about that. I had expected a rebase to update the date. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] compiler warnings in copy.c
On 2015-01-28 15:05:11 -0500, Stephen Frost wrote: Also, you seem to have pushed these commits with a date more than two weeks in the past. Please don't do that! Oh, wow, sorry about that. I had expected a rebase to update the date. It updates the committer, but not the author date. Use --pretty=fuller to see all the details. You can pass rebase --ignore-date to also reset the author date. Or commit --amend --reset Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers