Re: [HACKERS] GiST penalty functions [PoC]
>autoconf check for IEEE 754 floats Autoconf man says folowing: >it is safe to assume IEEE-754 in most portable code these days https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability > A union might be more readable Here is union version of the patch. It's slower 10% than original cube and dereference version. Have no idea why. Select performance is improved as in v3. Also I've investigated a bit why linear package failed. It's usefull to think in terms of bijections, not in terms of arithmetic functions. float 1 is 1065353216 hex 3f80 FLT_MAX / ( 8 >> 3 ) is 2139095039 hex 7f7f FLT_MAX / ( 8 >> 2 ) is 2130706431 hex 7eff FLT_MAX / ( 8 >> 1 ) is 2122317823 hex 7e7f FLT_MAX / ( 8 >> 0 ) is 2113929215 hex 7dff This realm borders are completly wrong. That maens I used 800 thousands values to pack 2billions of values of realms 1-3, and all other 2.1 bils of values to pack realm 0. Fragile arithmetics was done wrong. What I had to use was Realm 3 INT32_MAX is nan (or something little less than nan) 3*INT32_MAX/4 is 3.689348e+19 Total little less than 512kk different values Realm 2 3*INT32_MAX/4 is 3.689348e+19 INT32_MAX/2 is 2.00e+00 Total 512kk different values Realm 1 INT32_MAX/2 is 2.00e+00 INT32_MAX/4 is 1.084202e-19 Total 512kk different values Realm 0 INT32_MAX/4 is 1.084202e-19 0 is 0.00e+00 Total 512kk different values Though I hadn't tested it yet. I'm not sure, BTW, hardcoding this consts is a good idea. Best regards, Andrey Borodin, Octonica & Ural Federal University. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..ded639b 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -91,14 +91,16 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static int32 cube_cmp_v0(NDBOX *a, NDBOX *b); +static boolcube_contains_v0(NDBOX *a, NDBOX *b); +static boolcube_overlap_v0(NDBOX *a, NDBOX *b); +static NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +static float pack_float(float actualValue, int realm); +static voidrt_cube_size(NDBOX *a, double *sz); +static voidrt_cube_edge(NDBOX *a, double *sz); +static NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions @@ -419,7 +421,32 @@ g_cube_decompress(PG_FUNCTION_ARGS) } PG_RETURN_POINTER(entry); } +/* + * Function to pack floats of different realms + * This function serves to pack bit flags inside float type + * Resulted value represent can be from four different "realms" + * Every value from realm 3 is greater than any value from realms 2,1 and 0. + * Every value from realm 2 is less than every value from realm 3 and greater + * than any value from realm 1 and 0, and so on. Values from the same realm + * loose two bits of precision. This technique is possible due to floating + * point numbers specification according to IEEE 754: exponent bits are highest + * (excluding sign bits, but here penalty is always positive). If float a is + * greater than float b, integer A with same bit representation as a is greater + * than integer B with same bits as b. + */ +static float +pack_float(float actualValue, int realm) +{ + union + { + float fp; + int32 bits; + } buffer; + buffer.fp = actualValue; + buffer.bits = (buffer.bits >> 2) + (INT32_MAX / 4) * realm; + return buffer.fp; +} /* ** The GiST Penalty method for boxes @@ -428,6 +455,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -441,9 +473,33 @@ g_cube_penalty(PG_FUNCTION_ARGS)
Re: [HACKERS] Stopping logical replication protocol
On 7 September 2016 at 15:44, Vladimir Gordiychukwrote: > Fixed patch in attach. It'd helpful if you summarize the changes made when posting revisions. >> * There are no tests. I don't see any really simple way to test this, >> though. > > I will be grateful if you specify the best way how to do it. I tests this > patches by pgjdbc driver tests that also build on head of postgres. You can > check test scenario for both patches in > PRhttps://github.com/pgjdbc/pgjdbc/pull/550 > > org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive > org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive Yeah, testing it isn't trivial in postgres core, since of course none of the tools know to send a client-initiated CopyDone. My suggestion is to teach pg_recvlogical to send CopyDone on the first SIGINT (control-C) and wait until the server ends the data-stream and returns to command mode. A second control-C should unilaterally close the connection and it should close after a timeout too. This will be backward compatible with 9.4/5/6 (just with a delay on exit by sigint). Then in a TAP test in src/test/recovery set up a logical decoding session with test_decoding and pg_recvlogical. Do a test xact then sigint pg_recvlogical and verify that it exits. To make sure it exited by mutual agreement with server, probably run pg_recvlogival at a higher debug level and text search for a message you print from pg_recvlogical when it gets server CopyDone in the response to client CopyDone. I don't think a different pg_recvlogical numeric exit code could be justified for this. It sounds like more work than I think it would actually be. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
On 7 September 2016 at 14:58, Tom Lanewrote: >> That may not be perceived as a "fix" by everybody, so we should not do >> it without an explicit agreement by many. > > Agreed, but I vote with Fujii-san for back-patching. No problem with backpatching, just wanted some +1s before I did it. Will do that tomorrow. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in two-phase transaction recovery
On 8 September 2016 at 07:43, Michael Paquierwrote: > On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich > wrote: >> Some time ago two-phase state file format was changed to have variable size >> GID, >> but several places that read that files were not updated to use new offsets. >> Problem >> exists in master and 9.6 and can be reproduced on prepared transactions with >> savepoints. > > Oops and meh. This meritates an open item, and has better be fixed by > 9.6.0. I am glad you noticed that honestly. And we had better take > care of this issue as soon as possible. Looking now. >> Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that >> buffer >> for 2pc file is allocated in TopMemoryContext but never freed. That probably >> exists >> for a long time. > > @@ -1886,6 +1886,8 @@ StandbyRecoverPreparedTransactions(bool overwriteOK) > Assert(TransactionIdFollows(subxid, xid)); > SubTransSetParent(xid, subxid, overwriteOK); > } > + > + pfree(buf); > } > This one is a good catch. I have checked also the other callers of > ReadTwoPhaseFile but I am not seeing any other leak. That's a leak, > not critical though so applying it only on HEAD would be enough IMO. Far from critical, but backpatched to 9.6 because it isn't just executed once at startup, it is executed every shutdown checkpoint. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Alter or rename enum value
> Given that you are now familiar with the internals of how enums are > implemented would it be possible to continue the work and add the "ALTER > TYPE DROP VALUE " command? I was thinking to try developing it, but I would be more than happy to help by testing and reviewing, if someone else would do. I don't think I have enough experience to think of all details of this feature. The main problem that has been discussed before was the indexes. One way is to tackle with it is to reindex all the tables after the operation. Currently we are doing it when the datatype of indexed columns change. So it should be possible, but very expensive. Another way, Thomas Munro suggested when we were talking about it, would be to add another column to mark the deleted rows to the catalog table. We can check for this column before allowing the value to be used. Indexes can continue using the deleted values. We can also re-use those entries when someone wants to add new value to the matching place. It should be safe as long as we don't update the sort order. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Sep 7, 2016 at 2:39 AM, Robert Haaswrote: > On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire > wrote: >>> The problem with this is that we allocate the entire amount of >>> maintenance_work_mem even when the number of actual dead tuples turns >>> out to be very small. That's not so bad if the amount of memory we're >>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous >>> to remove the 1 GB limit, because somebody might have >>> maintenance_work_mem set to tens or hundreds of gigabytes to speed >>> index creation, and allocating that much space for a VACUUM that >>> encounters 1 dead tuple does not seem like a good plan. >>> >>> What I think we need to do is make some provision to initially >>> allocate only a small amount of memory and then grow the allocation >>> later if needed. For example, instead of having >>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as >>> ItemPointer * and allocate the array progressively in segments. I'd >>> actually argue that the segment size should be substantially smaller >>> than 1 GB, like say 64MB; there are still some people running systems >>> which are small enough that allocating 1 GB when we may need only 6 >>> bytes can drive the system into OOM. >> >> This would however incur the cost of having to copy the whole GB-sized >> chunk every time it's expanded. It woudln't be cheap. > > No, I don't want to end up copying the whole array; that's what I > meant by allocating it progressively in segments. Something like what > you go on to propose. > >> I've monitored the vacuum as it runs and the OS doesn't map the whole >> block unless it's touched, which it isn't until dead tuples are found. >> Surely, if overcommit is disabled (as it should), it could exhaust the >> virtual address space if set very high, but it wouldn't really use the >> memory unless it's needed, it would merely reserve it. > > Yeah, but I've seen actual breakage from exactly this issue on > customer systems even with the 1GB limit, and when we start allowing > 100GB it's going to get a whole lot worse. > >> To fix that, rather than repalloc the whole thing, dead_tuples would >> have to be an ItemPointer** of sorted chunks. That'd be a >> significantly more complex patch, but at least it wouldn't incur the >> memcpy. > > Right, this is what I had in mind. I don't think this is actually > very complicated, because the way we use this array is really simple. > We basically just keep appending to the array until we run out of > space, and that's not very hard to implement with an array-of-arrays. > The chunks are, in some sense, sorted, as you say, but you don't need > to do qsort() or anything like that. You're just replacing a single > flat array with a data structure that can be grown incrementally in > fixed-size chunks. > If we replaced dead_tuples with an array-of-array, isn't there negative performance impact for lazy_tid_reap()? As chunk is added, that performance would be decrease. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On 8 Sep. 2016 1:38 pm, "Andres Freund"wrote: > > I kind of dislike this approach for a different reason than already > mentioned in this thread: Namely it's not re-usable for implementing > streaming logical replication of large transaction, i.e. allow to decode > & apply un-committed changes. The issue there is that one really needs > to have multiple transactions active in the same connection, which this > approach does not allow. I've been thinking about this recently with an eye to handling the majority of transactions on a typical system that neither perform DDL nor are concurrent with it. The following might be fairly nonsensical if I've misunderstood the problem as I've not had a chance to more than glance at it, but: I wonder if some kind of speculative concurrent decoding+output is possible without being able to have multiple xacts on a backend. We could use a shared xact and snapshot for all concurrent xacts. If any of them do anything that requires invalidations etc we dump the speculatively processed ones and fall back to reorder buffering and serial output at commit time until we pass the xact that caused an invalidation and don't have more pending. Add a callback to the output plugin to tell it that speculative decoding of an xact has been aborted. Or maybe even just put that speculstive decoding on hold and pick up where we left off partway through the reorder buffer when we get around to processing it serially. That way we don't have to discard work already done. More usefully we could avoid having to enqueue stuff into the reorder buffer just in case we have to switch to fallback serial decoding. Failing that: Since logical decoding only needs read only xacts that roll back, it only needs multiple backend private virtualxacts. They don't need SERIALIZABLE/SSI which I think (?) means other backends don't need to be able to wait on them. That seems simpler than what autonomous xacts would need since there we need them exposed in PgXact, waitable-on for txid locks, etc. Right? In the same way that historical snapshots are cut-down maybe logical decoding xacts can be too. I suspect that waiting for full in-process multiple xact support to do interleaved decoding/replay will mean a long wait indeed.
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Wed, Sep 7, 2016 at 4:03 AM, Josh Berkuswrote: > On 08/29/2016 06:52 AM, Fujii Masao wrote: >> Also I like the following Simon's idea. >> >> https://www.postgresql.org/message-id/canp8+jlhfbvv_pw6grasnupw+bdk5dctu7gwpnap-+-zwvk...@mail.gmail.com >> --- >> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now >> * any k (n1, n2, n3) – would release waiters as soon as we have the >> responses from k out of N standbys. “any k” would be faster, so is >> desirable for performance and resilience > > What are we going to do for backwards compatibility, here? > > So, here's the dilemma: > > If we want to keep backwards compatibility with 9.6, then: > > "k (n1, n2, n3)" == "first k (n1, n2, n3)" > > However, "first k" is not what most users will want, most of the time; > users of version 13, years from now, will be getting constantly confused > by "first k" behavior when they wanted quorum. So the sensible default > would be: > > "k (n1, n2, n3)" == "any k (n1, n2, n3)" > +1. "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward compatibility but most users would think "k(n1, n2, n3)" as quorum after introduced quorum. I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, n3)" style before 9.6 releasing if we got consensus. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Wed, Sep 7, 2016 at 12:47 AM, Masahiko Sawadawrote: > On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs wrote: >> On 29 August 2016 at 14:52, Fujii Masao wrote: >>> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek wrote: On 04/08/16 06:40, Masahiko Sawada wrote: > > On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier > wrote: >> >> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada >> wrote: >>> >>> I was thinking that the syntax for quorum method would use '[ ... ]' >>> but it will be confused with '( ... )' priority method used. >>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>> might need to discuss about better syntax, discussion is very welcome. >>> Attached draft patch, please give me feedback. >> >> >> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >> for the addition of a keyword in front of the integer defining for how >> many nodes server need to wait for. > > > Thank you for reply. > "{}" or "[]" are not bad but because these are not intuitive, I > thought that it will be hard for uses to use different method for each > purpose. > I think the "any" keyword is more explicit and understandable, also closer to SQL. So I would be in favor of doing that. >>> >>> +1 >>> >>> Also I like the following Simon's idea. >>> >>> https://www.postgresql.org/message-id/canp8+jlhfbvv_pw6grasnupw+bdk5dctu7gwpnap-+-zwvk...@mail.gmail.com >>> --- >>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now >>> * any k (n1, n2, n3) – would release waiters as soon as we have the >>> responses from k out of N standbys. “any k” would be faster, so is >>> desirable for performance and resilience >>> --- >> >> +1 >> >> "synchronous_method" -> "synchronization_method" > > Thanks, will fix. > >> I'm concerned about the performance of this code. Can we work out a >> way of measuring it, so we can judge how successful we are at >> releasing waiters quickly? Thanks > > I will measure the performance effect of this code. > I'm expecting that performances are, > 'first 1 (n1, n2)' > 'any 1(n1, n2)' > 'first 2(n1, n2)' > 'first 1 (n1, n2)' will be highest throughput. > Sorry, that's wrong. 'any 1(n1, n2)' will be highest throughput or same as 'first 1(n1, n2)'. >> For 9.6 we implemented something that allows the DBA to define how >> slow programs are. Previously, since 9.1 this was something specified >> on the application side. I would like to put it back that way, so we >> end up with a parameter on client e.g. commit_quorum = k. Forget the >> exact parameters/user API for now, but I'd like to allow the code to >> work with user defined settings. Thanks. > > I see. The parameter on client should effect for priority method as well. > And similar to synchronous_commit, the client can specify the how much > standbys the master waits to commit for according to synchronization > method, even if s_s_names is defined. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in two-phase transaction recovery
On Thu, Sep 8, 2016 at 12:13 PM, Michael Paquierwrote: > On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich > wrote: > > Some time ago two-phase state file format was changed to have variable > size GID, > > but several places that read that files were not updated to use new > offsets. Problem > > exists in master and 9.6 and can be reproduced on prepared transactions > with > > savepoints. > > Oops and meh. This meritates an open item, and has better be fixed by > 9.6.0. I am glad you noticed that honestly. And we had better take > care of this issue as soon as possible. > > Indeed, it's a bug. Thanks Stas for tracking it down and Michael for the review and checking other places. I also looked at the patch and it seems fine to me. FWIW I looked at all other places where TwoPhaseFileHeader is referred and they look safe to me. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Thu, Sep 8, 2016 at 2:23 AM, Tom Lanewrote: > I really don't like this solution. Changing this one function, out of the > dozens just like it in lsyscache.c, seems utterly random and arbitrary. > > I'm actually not especially convinced that passing domain_in a random > OID needs to not produce an XX000 error. That isn't a user-facing > function and people shouldn't be calling it by hand at all. The same > goes for record_in. But if we do want those cases to avoid this, > I think it's appropriate to fix it in those functions, not decree > that essentially-random other parts of the system should bear the > responsibility. (Thought experiment: if we changed the order of > operations in domain_in so that getBaseTypeAndTypmod were no longer > the first place to fail, would we undo this change and then change > wherever the failure had moved to? That's pretty messy.) > > In the case of record_in, it seems like it'd be easy enough to use > lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() > and then throw a user-facing error right there. Actually when we are passing invalid type to "record_in", error is thrown in "lookup_type_cache" function, and this function is not taking "no_error" input (it's only limited to lookup_rowtype_tupdesc_internal). One option is to pass "no_error" parameter to "lookup_type_cache" function also, and throw error only in record_in function. But problem is, "lookup_type_cache" has lot of references. And also will it be good idea to throw one generic error instead of multiple specific errors. ? Another option is to do same for "record_in" also what you have suggested for domain_in (an extra syscache lookup). ? >Not sure about a > nice solution for domain_in. We might have to resort to an extra > syscache lookup there, but even if we did, it should happen only > once per query so that's not very awful. > >> 3. CheckFunctionValidatorAccess: This is being called from all >> language validator functions. > > This part seems reasonable, since the validator functions are documented > as something users might call, and CheckFunctionValidatorAccess seems > like an apropos place to handle it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On 09/08/2016 03:36 AM, Peter Geoghegan wrote: On Wed, Sep 7, 2016 at 2:42 PM, Tom Lanewrote: The reason it's called siftup is that that's what Knuth calls it. See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8. I see that Knuth explains that these steps form the siftup procedure. What steps does tuplesort_heap_insert correspond to, if any? Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as Knuth's steps H3-H8. But the start is different. Knuth's algorithm begins with the situation that the top node of the sub-heap is in wrong place, and it pushes it downwards, until the whole sub-heap satisfies the heap condition again. But tuplesort_heap_siftup() begins by moving the last value in the heap to the top, and then it does the H3-H8 steps to move it to the right place. Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is the Extract operation. And the loop inside it corresponds to the Max-Heapify operation, which is the same as Knuth's "siftup". I still think tuplesort_heap_siftup is a confusing name, although I'm not sure that Peter's "compact" is much better. I suggest that we rename it to tuplesort_heap_delete_top(). In comments within the function, explain that the *loop* corresponds to the "siftup" in Knuth's book. Interestingly, as Knuth explains the siftup algorithm, it performs a "replace the top" operation, rather than "remove the top". But we use it to remove the top, by moving the last element to the top and running the algorithm after that. Peter's other patch, to change the way we currently replace the top node, to do it in one pass rather than as delete+insert, is actually Knuth's siftup algorithm. Peter, looking at your "displace" patch in this light, I think tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm calling it now), should share a common subroutine. Displace replaces the top node with the new node passed as argument, and then calls the subroutine to push it down to the right place. Delete_top replaces the top node with the last value in the heap, and then calls the subroutine. Or perhaps delete_top should remove the last value from the heap, and then call displace() with it, to re-insert it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting SJIS as a database encoding
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > HORIGUCHI > > $ time psql postgres -c 'select t.a from t, generate_series(0, )' > > /dev/null > > real 0m22.696s > user 0m16.991s > sys 0m0.182s> > > Using binsearch the result for the same operation was > real 0m35.296s > user 0m17.166s > sys 0m0.216s > > Returning in UTF-8 bloats the result string by about 1.5 times so it doesn't > seem to make sense comparing with it. But it takes real = 47.35s. Cool, 36% speedup! Does this difference vary depending on the actual characters used, e.g. the speedup would be greater if most of the characters are ASCII? Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 7:22 PM, Kuntal Ghoshwrote: > Hello, Could you avoid top-posting please? More reference here: http://www.idallen.com/topposting.html > - If WAL consistency check is enabled for a rmgrID, we always include > the backup image in the WAL record. What happens if wal_consistency has different settings on a standby and its master? If for example it is set to 'all' on the standby, and 'none' on the master, or vice-versa, how do things react? An update of this parameter should be WAL-logged, no? > - I've extended the RmgrTable with a new function pointer > rm_checkConsistency, which is called after rm_redo. (only when WAL > consistency check is enabled for this rmgrID) > - In each rm_checkConsistency, both backup pages and buffer pages are > masked accordingly before any comparison. This leads to heavy code duplication... > - In postgresql.conf, a new guc variable named 'wal_consistency' is > added. Default value of this variable is 'None'. Valid values are > combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist, > BRIN, Generic and XLOG. It can also be set to 'All' to enable all the > values. Lower-case is the usual policy for parameter values for GUC parameters. > - In recovery tests (src/test/recovery/t), I've added wal_consistency > parameter in the existing scripts. This feature doesn't change the > expected output. If there is any inconsistency, it can be verified in > corresponding log file. I am afraid that just generating a WARNING message is going to be useless for the buildfarm. If we want to detect errors, we could for example have an additional GUC to trigger an ERROR or a FATAL, taking down the cluster, and allowing things to show in red on a platform. > Results > > > I've tested with installcheck and installcheck-world in master-standby > set-up. Followings are the configuration parameters. So you tested as well the recovery tests, right? > I got two types of inconsistencies as following: > > 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup > page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after > redo, only BTP_DELETED flag is set in buffer page. I assume that we > should clear all btpo_flags before setting BTP_DELETED in > _bt_unlink_halfdead_page(). The page is deleted, it does not matter, so you could just mask all the flags for a deleted page... [...] + /* +* Mask everything on a DELETED page. +*/ + if (((BTPageOpaque) PageGetSpecialPointer(page_norm))->btpo_flags & BTP_DELETED) And that's what is happening. > 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in > REVMAP page. This happens only for two cases. I'm not sure what the > reason can be. Hm? This smells like a block reference bug. What are the cases you are referring to? > I haven't done sufficient tests yet to measure the overhead of this > modification. I'll do that next. I did a first pass on your patch, and I think that things could be really reduced. There is much code duplication, but see below for the details.. #include "access/xlogutils.h" - +#include "storage/bufmask.h" I know that I am a noisy one on the matter, but please double-check for such useless noise in your patch. And there is not only one. + newwalconsistency = (bool *) guc_malloc(ERROR,(RM_MAX_ID + 1)*sizeof(bool)); This spacing is not project-style. You may want to go through that: https://www.postgresql.org/docs/devel/static/source.html +$node_master->append_conf( + 'postgresql.conf', qq( +wal_consistency = 'All' +)); Instead of duplicating that 7 times, you could just do it once in the init() method of PostgresNode.pm. This really has meaning if enabled by default. + /* +* Followings are the rmids which can have backup blocks. +* We'll enable this feature only for these rmids. +*/ + newwalconsistency[RM_HEAP2_ID] = true; + newwalconsistency[RM_HEAP_ID] = true; + newwalconsistency[RM_BTREE_ID] = true; + newwalconsistency[RM_HASH_ID] = true; + newwalconsistency[RM_GIN_ID] = true; + newwalconsistency[RM_GIST_ID] = true; + newwalconsistency[RM_SEQ_ID] = true; + newwalconsistency[RM_SPGIST_ID] = true; + newwalconsistency[RM_BRIN_ID] = true; + newwalconsistency[RM_GENERIC_ID] = true; + newwalconsistency[RM_XLOG_ID] = true; Here you can just use MemSet with RM_MAX_ID and simplify this code maintenance. + for(i = 0; i < RM_MAX_ID + 1 ; i++) + newwalconsistency[i] = false; + break; Same here you can just use MemSet. + else if (pg_strcasecmp(tok, "NONE") == 0) [...] + else if (pg_strcasecmp(tok, "ALL") == 0) It seems to me that using NONE or ALL with any other keywords should not be allowed. + if (pg_strcasecmp(tok, "Heap2") == 0) + { + newwalconsistency[RM_HEAP2_ID]
Re: [HACKERS] Parallel build with MSVC
* Noah Misch wrote: Committed. Much apologizings for coming in late again, but I just realized it would be better if the user-controlled flags came after all predefined options the user might want to override. Right now that is only /verbosity in both build and clean operations. Patch attached, also reordering the ecpg-clean command line in clean.bat to match the others that have the project file first. -- Christian >From 09a5f3945e2b87b56ca3525a56db970e9ecf8ffd Mon Sep 17 00:00:00 2001 From: Christian UllrichDate: Thu, 8 Sep 2016 08:34:45 +0200 Subject: [PATCH] Let MSBFLAGS be used to override predefined options. --- src/tools/msvc/build.pl | 4 ++-- src/tools/msvc/clean.bat | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/tools/msvc/build.pl b/src/tools/msvc/build.pl index 5273977..a5469cd 100644 --- a/src/tools/msvc/build.pl +++ b/src/tools/msvc/build.pl @@ -54,7 +54,7 @@ elsif (uc($ARGV[0]) ne "RELEASE") if ($buildwhat and $vcver >= 10.00) { system( - "msbuild $buildwhat.vcxproj $msbflags /verbosity:normal /p:Configuration=$bconf" + "msbuild $buildwhat.vcxproj /verbosity:normal $msbflags /p:Configuration=$bconf" ); } elsif ($buildwhat) @@ -63,7 +63,7 @@ elsif ($buildwhat) } else { - system("msbuild pgsql.sln $msbflags /verbosity:normal /p:Configuration=$bconf"); + system("msbuild pgsql.sln /verbosity:normal $msbflags /p:Configuration=$bconf"); } # report status diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat index e21e37f..b972ddf 100755 --- a/src/tools/msvc/clean.bat +++ b/src/tools/msvc/clean.bat @@ -107,6 +107,6 @@ REM for /r %%f in (*.sql) do if exist %%f.in del %%f cd %D% REM Clean up ecpg regression test files -msbuild %MSBFLAGS% /NoLogo ecpg_regression.proj /t:clean /v:q +msbuild ecpg_regression.proj /NoLogo /v:q %MSBFLAGS% /t:clean goto :eof -- 2.10.0.windows.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST penalty functions [PoC]
On 09/08/2016 09:39 AM, Михаил Бахтерев wrote: Excuse me for intervention. It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which does not have sqrt support. But you are right, it is supported in most cases. And if execution speed of this very fuction is of concern, sqrtf(x) should be used instead of sqrt(x). Despite this, Andrew's solution gives more accurate representation of values. And as far as i understand, this improves overall performance by decreasing the overall amount of instructions, which must be executed. BTW, I would be OK with the bit-twiddling hack, if we had an autoconf check for IEEE 754 floats, and a graceful fallback for other systems. The fallback could be simply the current penalty function. You wouldn't get the benefit from the better penalty function on non-IEEE systems, then, but it would still be correct. It is possible to speed up Andrew's implementation and get rid of warnings by using bit-masks and unions. Something like: union { float f; struct { unsigned int mantissa:23, exponent:8, sign:1; } bits; } I am sorry, i have no time to check this. But it is common wisdom to avoid pointer-based memory accesses in high-performance code, as they create a lot of false write-to-read dependencies. The compiler should be smart enough to generate the same instructions either way. A union might be more readable, though. (We don't need to extract the mantissa, exponent and sign, so a union of float and int32 would do.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sample configuration files
On 08/29/2016 03:34 AM, Vik Fearing wrote: > We have sample configuration files for postgresql.conf and > recovery.conf, but we do not have them for contrib modules. This patch > attempts to add them. > > Although the patch puts the sample configuration files in the tree, it > doesn't try to install them. That's partly because I think it would > need an extension version bump and that doesn't seem worth it. > > Also, while writing this patch, it crossed my mind that the plpgsql > variables should probably be in the main postgresql.conf file because we > install it by default. I will happily write that patch if desired, > independently of this patch. > > Current as of 26fa446, added to next commitfest. I noticed that this patch has been marked Waiting on Author with no comment. Peter, what more should I be doing right now while waiting for Martín's review? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in two-phase transaction recovery
On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvichwrote: > Some time ago two-phase state file format was changed to have variable size > GID, > but several places that read that files were not updated to use new offsets. > Problem > exists in master and 9.6 and can be reproduced on prepared transactions with > savepoints. Oops and meh. This meritates an open item, and has better be fixed by 9.6.0. I am glad you noticed that honestly. And we had better take care of this issue as soon as possible. > For example: > > create table t(id int); > begin; > insert into t values (42); > savepoint s1; > insert into t values (43); > prepare transaction 'x'; > begin; > insert into t values (142); > savepoint s1; > insert into t values (143); > prepare transaction 'y’; > > and restart the server. Startup process will hang. Fix attached. In crash recovery this is very likely to fail with an assertion if those are enabled: TRAP: FailedAssertion("!(TransactionIdFollows(subxid, xid))", File: "twophase.c", Line: 1767) And the culprit is e0694cf9, which makes this open item owned by Simon. I also had a look at the patch you are proposing, and I think that's a correct fix. I also looked at the other code paths scanning the 2PC state file and did not notice extra problems. The good news is that the 2PC file generation is not impacted, only its scan at recovery is, so the impact of the bug is limited for existing 9.6 deployments where both 2PC and subtransactions are involved. > Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that > buffer > for 2pc file is allocated in TopMemoryContext but never freed. That probably > exists > for a long time. @@ -1886,6 +1886,8 @@ StandbyRecoverPreparedTransactions(bool overwriteOK) Assert(TransactionIdFollows(subxid, xid)); SubTransSetParent(xid, subxid, overwriteOK); } + + pfree(buf); } This one is a good catch. I have checked also the other callers of ReadTwoPhaseFile but I am not seeing any other leak. That's a leak, not critical though so applying it only on HEAD would be enough IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel build with MSVC
On Thu, Sep 8, 2016 at 3:54 PM, Christian Ullrichwrote: > Much apologizings for coming in late again, but I just realized it would be > better if the user-controlled flags came after all predefined options the > user might want to override. Right now that is only /verbosity in both build > and clean operations. > > Patch attached, also reordering the ecpg-clean command line in clean.bat to > match the others that have the project file first. -"msbuild $buildwhat.vcxproj $msbflags /verbosity:normal /p:Configuration=$bconf" +"msbuild $buildwhat.vcxproj /verbosity:normal $msbflags /p:Configuration=$bconf" Why not... If people are willing to switch to /verbosity:detailed and double the amount of build time... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
On Wed, Sep 7, 2016 at 4:11 PM, Simon Riggswrote: > On 7 September 2016 at 04:13, Masahiko Sawada wrote: > >> Since current HEAD could scan visibility map twice, the execution time >> of Patched is approximately half of HEAD. > > Sounds good. > > To ensure we are doing exactly same amount of work as before, did you > see the output of VACUUM VEROBOSE? Sorry, the previous test result I posted was something wrong. I rerun the performance test and results are, * 1TB Table(visibility map size is 32MB) HEAD : 4853.250 ms (00:04.853) Patched : 3805.789 ms (00:03.806) * 8TB Table(visibility map size is 257MB) HEAD : 37853.891 ms (00:37.854) Patched : 30153.943 ms (00:30.154) * 32TB Table(visibility map size is 1GB) HEAD: 151908.344 ms (02:31.908) Patched: 120560.037 ms (02:00.560) Since visibility map page can be cached onto shared buffer or OS cache by first scanning, the benefit of this patch seems not to be large. Here are outputs of VACUUM VERBOSE for 32TB table. * HEAD INFO: vacuuming "public.vm_skip_test" INFO: "vm_skip_test": found 0 removable, 0 nonremovable row versions in 0 out of 4294967294 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. Skipped 4294967294 all-frozen pages according to visibility map. 0 pages are entirely empty. CPU 1.06s/148.11u sec elapsed 149.20 sec. VACUUM Time: 151908.344 ms (02:31.908) * Patched INFO: vacuuming "public.vm_skip_test" INFO: "vm_skip_test": found 0 removable, 0 nonremovable row versions in 0 out of 4294967294 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. Skipped 4294967294 all-frozen pages according to visibility map. 0 pages are entirely empty. CPU 0.65s/117.15u sec elapsed 117.81 sec. VACUUM Time: 120560.037 ms (02:00.560) Current manual vacuum doesn't output how may all_frozen pages we skipped according to visibility map. That's why I attached 0001 patch which makes the manual vacuum emit such information. > > Can we produce a test that verifies the result patched/unpatched? > Attached test shell script but because I don't have such a large disk, I've measured performance benefit using by something like unofficial way. To make a situation where table is extremly large and make corresponding visibility map, I applied 0002 patch and made a fake visibility map. Attached 0002 patch adds GUC parameter cheat_vacuum_table_size which artificially defines table size being vacuumed . For example, If we do, SET cheat_vacuum_table_size = 4; VACUUM vm_test; then in lazy_scan_heap, vm_test table is processed as an 8TB(MaxBlockNumber / 4) table. Attached test shell script makes fake visibility map files and executes the performance tests for 1TB, 8TB and 32TB table. Please confirm it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center From bbda8e4144101a41804865d88592a15bf0150340 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Thu, 8 Sep 2016 11:24:21 -0700 Subject: [PATCH 1/2] lazy_scan_heap outputs how many all_frozen pages are skipped. --- src/backend/commands/vacuumlazy.c | 4 1 file changed, 4 insertions(+) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index b5fb325..20d8334 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -1331,6 +1331,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, "Skipped %u pages due to buffer pins.\n", vacrelstats->pinskipped_pages), vacrelstats->pinskipped_pages); + appendStringInfo(, ngettext("Skipped %u all-frozen page according to visibility map.\n", + "Skipped %u all-frozen pages according to visibility map.\n", + vacrelstats->frozenskipped_pages), + vacrelstats->frozenskipped_pages); appendStringInfo(, ngettext("%u page is entirely empty.\n", "%u pages are entirely empty.\n", empty_pages), -- 2.8.1 From 01e493a44ba6c5bb9b59a1016bfce4b9bcf75e95 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Thu, 8 Sep 2016 11:36:28 -0700 Subject: [PATCH 2/2] Add cheat_vacuum_table_size. --- contrib/pg_visibility/pg_visibility.c | 6 +- src/backend/commands/vacuumlazy.c | 9 +++-- src/backend/utils/misc/guc.c | 10 ++ src/include/commands/vacuum.h | 1 + 4 files changed, 23 insertions(+), 3 deletions(-) diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index 7034066..37ace99 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -12,6 +12,7 @@ #include "access/visibilitymap.h" #include "catalog/pg_type.h" #include "catalog/storage_xlog.h" +#include "commands/vacuum.h" #include "funcapi.h" #include
Re: [HACKERS] Supporting SJIS as a database encoding
Hello, At Wed, 07 Sep 2016 16:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20160907.161304.112519789.horiguchi.kyot...@lab.ntt.co.jp> > > Implementing radix tree code, then redefining the format of mapping table > > > to suppot radix tree, then modifying mapping generator script are needed. > > > > > > If no one oppse to this, I'll do that. So, I did that as a PoC. The radix tree takes a little less than 100k bytes (far smaller than expected:) and it is defnitely faster than binsearch. The attached patch does the following things. - Defines a struct for static radix tree (utf_radix_tree). Currently it supports up to 3-byte encodings. - Adds a map generator script UCS_to_SJIS_radix.pl, which generates utf8_to_sjis_radix.map from utf8_to_sjis.map. - Adds a new conversion function utf8_to_sjis_radix. - Modifies UtfToLocal so as to allow map to be NULL. - Modifies utf8_to_sjis to use the new conversion function instead of ULmapSJIS. The followings are to be done. - utf8_to_sjis_radix could be more generic. - SJIS->UTF8 is not implemented but it would be easily done since there's no difference in using the radix tree mechanism. (but the output character is currently assumed to be 2-byte long) - It doesn't support 4-byte codes so this is not applicable to sjis_2004. Extending the radix tree to support 4-byte wouldn't be hard. The following is the result of a simple test. =# create table t (a text); alter table t alter column a storage plain; =# insert into t values ('... 7130 cahracters containing (I believe) all characters in SJIS encoding'); =# insert into t values ('... 7130 cahracters containing (I believe) all characters in SJIS encoding'); # Doing that twice is just my mistake. $ export PGCLIENTENCODING=SJIS $ time psql postgres -c ' $ psql -c '\encoding' postgres SJIS $ time psql postgres -c 'select t.a from t, generate_series(0, )' > /dev/null real0m22.696s user0m16.991s sys 0m0.182s> Using binsearch the result for the same operation was real0m35.296s user0m17.166s sys 0m0.216s Returning in UTF-8 bloats the result string by about 1.5 times so it doesn't seem to make sense comparing with it. But it takes real = 47.35s. regards, -- Kyotaro Horiguchi NTT Open Source Software Center 0001-Use-radix-tree-to-encoding-characters-of-Shift-JIS.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestions for first contribution?
On Wed, Sep 07, 2016 at 05:14:44PM +0300, Aleksander Alekseev wrote: > Here is another idea for a contribution - refactoring. Refactoring is not a good early contribution. Refactoring is more likely to succeed once you internalize community code practices through much study of functional patches. However, even committer-initiated refactoring has a high failure rate. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Alter or rename enum value
On 7 September 2016 at 22:14, Tom Lanewrote: > ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > > Here is version 6 of the patch, which just adds RENAME VALUE with no IF > > [NOT] EXISTS, rebased onto current master (particularly the > > transactional ADD VALUE patch). > > Pushed with some adjustments. The only thing that wasn't a matter of > taste is you forgot to update the backend/nodes/ support functions > for struct AlterEnumStmt. > > regards, tom lane > Thank you all for committing this feature! Given that you are now familiar with the internals of how enums are implemented would it be possible to continue the work and add the "ALTER TYPE DROP VALUE " command? Thank you! Regards, Matthias
Re: [HACKERS] GiST penalty functions [PoC]
>BTW, would someone explain to me why using a float here will not fail >catastrophically for inputs outside the range of float? Indeed, it will. And that's one of the reason I'm considering advancing GiST API instead of advancing extensions. It won't crash, but will produce terrible index, not suitable of performing efficient queries. GiST treats penalty this way: /* disallow negative or NaN penalty */ if (isnan(penalty) || penalty < 0.0) penalty = 0.0; Any NaN is a "perfect fit". Calculation of real (say float) penalty and choice of subtree with smallest penalty is an idea from 92. Modern spatial index requires more than just a float to compare subtrees. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
Michael Paquierwrites: > But I don't understand the reason behind such a restriction to be > honest because libxml2 does not depend on libxslt. The contrary is > true: libxslt needs libxml2. Right. > Note as well that libxml2 does depend on ICONV though. Hm, is that true either? I don't see any sign of such a dependency on Linux: $ ldd /usr/lib64/libxml2.so.2.7.6 linux-vdso.so.1 => (0x7fff98f6b000) libdl.so.2 => /lib64/libdl.so.2 (0x00377a60) libz.so.1 => /lib64/libz.so.1 (0x00377ae0) libm.so.6 => /lib64/libm.so.6 (0x003779e0) libc.so.6 => /lib64/libc.so.6 (0x003779a0) /lib64/ld-linux-x86-64.so.2 (0x00377960) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestions for first contribution?
On Wed, Sep 7, 2016 at 10:44 AM, Stas Kelvichwrote: > There is also a list of projects for google summer of code: > https://wiki.postgresql.org/wiki/GSoC_2016 > > That topics expected to be doable by a newcomer during several months. It is > also slightly > outdated, but you always can check current state of things by searching this > mail list on interesting topic. > > Also postgres have several internal API’s like fdw[1] or gist[2] that allows > you to implements something > useful without digging too much into a postgres internals. > > [1] https://www.postgresql.org/docs/current/static/fdwhandler.html > [2] https://www.postgresql.org/docs/current/static/gist-extensibility.html Thanks Stas, I'll have a look at those. I appreciate the links. - Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \timing interval
2016-09-08 13:10 GMT+02:00 Craig Ringer: > On 4 Sep. 2016 3:36 am, "Tom Lane" wrote: > > > > > After further thought I concluded that not providing any labeling of > > days is a bad idea. > > Yeah. I think labeling days is definitely good. I'm glad you changed that. > > Personally I'd like to trim milliseconds when dealing with minute+ long > runs and seconds from hour+ runs too, since it's all there in the ms output > and the units output is for human readability. I see the value of retaining > full precision too, though, and don't feel strongly about it. > It should be hard to read without units. I know so now it is maybe late, but the current output is not too readable for me. I miss units - 10s, 1m20.5s, 1h 30m 5s Regards Pavel
Re: [HACKERS] GiST penalty functions [PoC]
Excuse me for intervention. It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which does not have sqrt support. But you are right, it is supported in most cases. And if execution speed of this very fuction is of concern, sqrtf(x) should be used instead of sqrt(x). Despite this, Andrew's solution gives more accurate representation of values. And as far as i understand, this improves overall performance by decreasing the overall amount of instructions, which must be executed. It is possible to speed up Andrew's implementation and get rid of warnings by using bit-masks and unions. Something like: union { float f; struct { unsigned int mantissa:23, exponent:8, sign:1; } bits; } I am sorry, i have no time to check this. But it is common wisdom to avoid pointer-based memory accesses in high-performance code, as they create a lot of false write-to-read dependencies. - Mikhail, respectfully On Wed, Sep 07, 2016 at 05:58:42PM -0400, Tom Lane wrote: > Heikki Linnakangaswrites: > > Unfortunately, sqrt(x) isn't very cheap. > > You'd be surprised: sqrt is built-in on most modern hardware. On my > three-year-old workstation, sqrt(x) seems to take about 2.6ns. For > comparison, the pack_float version posted in > > takes 3.9ns (and draws multiple compiler warnings, too). > > regards, tom lane signature.asc Description: PGP signature
Re: [HACKERS] High-CPU consumption on information_schema (only) query
On 8 Sep. 2016 7:38 am, "Robins Tharakan"wrote: > > Hi, > > An SQL (with only information_schema related JOINS) when triggered, runs with max CPU (and never ends - killed after 2 days). > - It runs similarly (very slow) on a replicated server that acts as a read-only slave. > - Top shows only postgres as hitting max CPU (nothing else). When query killed, CPU near 0%. > - When the DB is restored on a separate test server (with the exact postgresql.conf) the same query works fine. > - There is no concurrent usage on the replicated / test server (although the primary is a Production server and has concurrent users). > > Questions: > - If this was a postgres bug or a configuration issue, query on the restored DB should have been slow too. Is there something very basic I am missing here? > > If someone asks for I could provide SQL + EXPLAIN, but it feels irrelevant here. I amn't looking for a specific solution but what else should I be looking for here? Get a series of stack traces. Perf with stack output would be good too. You need debug info for both.
Re: [HACKERS] Suggestions for first contribution?
On Wed, Sep 7, 2016 at 10:14 AM, Aleksander Alekseevwrote: > Here is another idea for a contribution - refactoring. > > Currently there are a lot of procedures in PostgreSQL code that > definitely don't fit on one screen (i.e. ~50 lines). Also many files are > larger than say 1000 lines of code. For instance, psql_completion > procedure is more than 2000 lines long! I think patches that would make > such code easier to read would have a good chance to be accepted. Thanks for the suggestion! I can see how refactoring could make a lot of sense for some functions. That's probably a task I'd want to take on after I had more experience with PG's code base. I think being familiar with most/all of PG's code would make that task go more smoothly. - Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/06 22:07, Ashutosh Bapat wrote: On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita> wrote: On 2016/08/22 15:49, Ashutosh Bapat wrote: 1. deparsePlaceHolderVar looks odd - each of the deparse* function is named as deparse + . PlaceHolderVar is not a parser node, so no string is going to be parsed as a PlaceHolderVar. May be you want to name it as deparseExerFromPlaceholderVar or something like that. The name "deparseExerFromPlaceholderVar" seems long to me. How about "deparsePlaceHolderExpr"? There isn't any node with name PlaceHolderExpr. I'll rename it to "deparseExerInPlaceholderVar", then. 3. I think registerAlias stuff should happen really at the time of creating paths and should be stored in fpinfo. Without that it will be computed every time we deparse the query. This means every time we try to EXPLAIN the query at various levels in the join tree and once for the query to be sent to the foreign server. Hmm. I think the overhead in calculating aliases would be negligible compared to the overhead in explaining each remote query for costing or sending the remote query for execution. So, I created aliases in the same way as remote params created and stored into params_list in deparse_expr_cxt. I'm not sure it's worth complicating the code. We defer remote parameter creation till deparse time since the the parameter numbers are dependent upon the sequence in which we deparse the query. Creating them at the time of path creation and storing them in fpinfo is not possible because a. those present in the joining relations will conflict with each other and need some kind of serialization at the time of deparsing b. those defer for differently parameterized paths or paths with different pathkeys. We don't defer it because it's very light on performance. That's not true with the alias information. As long as we detect which relations need subqueries, their RTIs are enough to create unique aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have alias r123 without conflicting with any other alias. Hmm. But another concern about the approach of generating an subselect alias for a path, if needed, at the path creation time would be that the path might be rejected by add_path, which would result in totally wasting cycles for generating the subselect alias for the path. However minimum overhead it might have, we will deparse the query every time we create a path involving that kind of relation i.e. for different pathkeys, different parameterization and different joins in which that relation participates in. This number can be significant. We want to avoid this overhead if we can. Exactly. As I said above, I think the overhead would be negligible compared to the overhead in explaining each remote query for costing or the overhead in sending the final remote query for execution, though. 5. The blocks related to inner and outer relations in deparseFromExprForRel() look same. We should probably separate that code out into a function and call it at two places. Done. I see you have created function deparseOperandRelation() for the same. I guess, this should be renamed as deparseRangeTblRef() since it will create RangeTblRef node when parsed back. OK, if there no opinions of others, I'll rename it to the name proposed by you, "deparseRangeTblRef". 6. ! if (is_placeholder) ! errcontext("placeholder expression at position %d in select list", !errpos->cur_attno); A user wouldn't know what a placeholder expression is, there is no such term in the documentation. We have to device a better way to provide an error context for an expression in general. Though I proposed that, I don't think that it's that important to let users know that the expression is from a PHV. How about just saying "expression", not "placeholder expression", so that we have the message "expression at position %d in select list" in the context? Hmm, that's better than placeholder expression, but not as explanatory as it should be since we won't be printing the "select list" in the error context and user won't have a clue as to what error context is about. I don't think so. Consider an example of the conversion error message, which is from the regression test: SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; ERROR: invalid input syntax for integer: "foo" CONTEXT: whole-row reference to foreign table "ft1" As shown in the example, the error message is displayed under a remote query for execution. So, ISTM it's reasonable to print something like
Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw
On 2016/09/07 13:21, Ashutosh Bapat wrote: * with the patch: postgres=# explain verbose delete from ft1 using ft2 where ft1.a = ft2.a; QUERY PLAN - Delete on public.ft1 (cost=100.00..102.04 rows=1 width=38) -> Foreign Delete (cost=100.00..102.04 rows=1 width=38) Remote SQL: DELETE FROM public.t1 r1 USING (SELECT ROW(a, b), a FROM public.t2) ss1(c1, c2) WHERE ((r1.a = ss1.c2)) (3 rows) The underlying scan on t2 requires ROW(a,b) for locking the row for update/share. But clearly it's not required if the full query is being pushed down. Exactly. Is there a way we can detect that ROW(a,b) is useless column (not used anywhere in the other parts of the query like RETURNING, DELETE clause etc.) and eliminate it? I don't have a clear solution for that yet, but I'll try to remove that in the next version. Similarly for a, it's part of the targetlist of the underlying scan so that the WHERE clause can be applied on it. But it's not needed if we are pushing down the query. If we eliminate the targetlist of the query, we could construct a remote query without having subquery in it, making it more readable. Will try to do so also. Thanks for the comments! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \timing interval
On 4 Sep. 2016 3:36 am, "Tom Lane"wrote: > > After further thought I concluded that not providing any labeling of > days is a bad idea. Yeah. I think labeling days is definitely good. I'm glad you changed that. Personally I'd like to trim milliseconds when dealing with minute+ long runs and seconds from hour+ runs too, since it's all there in the ms output and the units output is for human readability. I see the value of retaining full precision too, though, and don't feel strongly about it.
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Tue, 6 Sep 2016 07:58:28 +0530 Mithun Cywrote: > On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev < > a.aleks...@postgrespro.ru> wrote: > >After a brief examination I've found following ways to improve the > >patch. > Adding to above comments. > I'm sending new version of patch. I've replaced readonly option with target_server_type (with JDBC compatibility alias targetServerType), use logic of setting defaults based on number of hosts in the connect string instead of complicated condition in the state machine, added checks for return values of memory allocation function. Also, I've solved pg_usleep problem by linking pgsleep.c into libpq along with some other libpgport objects which are already linked there. Thus client applications don't need to link with libpgport and libpq shared library is self-containted. diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 4e34f00..f0df877 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10 The general form for a connection URI is: -postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...] +postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...] @@ -809,6 +809,7 @@ postgresql://localhost/mydb postgresql://user@localhost postgresql://user:secret@localhost postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp +postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomtarget_server_type=any Components of the hierarchical part of the URI can also be given as parameters. For example: @@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433 For improved compatibility with JDBC connection URIs, instances of parameter ssl=true are translated into -sslmode=require. +sslmode=require and +loadBalanceHosts=true into +hostorder=random. @@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433 postgresql://[2001:db8::1234]/database + + There can be serveral host specifications, optionally accompanied + with port, separated by comma. + The host component is interpreted as described for the parameter PostgreSQL was built). On machines without Unix-domain sockets, the default is to connect to localhost. + + There can be more than one host parameter in + the connect string. In this case these hosts would be considered + alternate entries into same database and if connect to first one + fails, library would try to connect second etc. This can be used + for high availability cluster or for load balancing. See +parameter. + + + Network host name can be accompanied with port number, separated by + colon. If so, this port number is used only when connected to + this host. If there is no port number, port specified in the +parameter would be used. + @@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + + hostorder + + + Specifies how to choose host from list of alternate hosts, + specified in the parameter. + + + If value of this argument is sequential (the + default) library connects to the hosts in order they specified, + and tries to connect second one only if connection to the first + fails. + + + If value is random host to connect is randomly + picked from the list. It allows to balance load between several + cluster nodes. However, currently PostgreSQL doesn't support + multimaster clusters. So, without use of third-party products, + only read-only connections can take advantage from the + load-balancing. See + + + + + target_server_type + + + If this parameter is master upon successful connection + library checks if host is in recovery state, and if it is so, + tries next host in the connect string. If this parameter is + any, connection to standby nodes are considered + successful. + + + + port @@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - connect_timeout @@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + failover_timeout + + + Maximum time to cyclically retry all the hosts in connect string. + (as decimal integer number of seconds). If not specified, then + hosts are tried just once. + + + If we have replicating cluster, and master node fails, it might + take some time to promote one of standby nodes to the new master. + So clients which notice that connect to the
Re: [HACKERS] GiST penalty functions [PoC]
Heikki Linnakangaswrites: > BTW, I would be OK with the bit-twiddling hack, if we had an autoconf > check for IEEE 754 floats, and a graceful fallback for other systems. I would still object to the version submitted, on the grounds of the compiler warnings it causes. Possibly those could be avoided with a union, though; Float4GetDatum doesn't produce such warnings. BTW, would someone explain to me why using a float here will not fail catastrophically for inputs outside the range of float? Cubes are based on doubles, not floats. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestions for first contribution?
On Wed, Sep 7, 2016 at 10:34 AM, Yury Zhuravlevwrote: > Christian Convey wrote: >> >> Yury doesn't seem to need help >> with CMake > > Hello. > I am sorry that the only answer is now. > I need help but with write CMake code: > 1. Make ecpg tests > 2. Make MinGW/Msys support > 3. many many ... > all targets and discussion here: > https://github.com/stalkerg/postgres_cmake/issues > > If you can only test CMake for Ubuntu x86_64 that it is not necessary. > If I not fully understand you - sorry. Hi Yury, no problem. It sounds like most of the platform testing you want requires hardware I don't have, unfortunately. - Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
On Thu, Sep 8, 2016 at 2:27 PM, Tom Lanewrote: > Michael Paquier writes: > > But I don't understand the reason behind such a restriction to be > > honest because libxml2 does not depend on libxslt. The contrary is > > true: libxslt needs libxml2. > > Right. > Pretty sure this goes back to *our* XML support requiring both. As in you couldn't turn on/off xslt independently, so the "xml requires xslt" comment is that *our* xml support required both. This may definitely not be true anymore, and that check has just not been updated. Also this was 10 years ago, so I'm of course not 100% sure, but I think it was something like that... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Declarative partitioning - another take
On Wed, Sep 7, 2016 at 3:58 PM, Amit Langotewrote: > > Hi, > > On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote: > > Hi, > > > > I have a query regarding list partitioning, > > > > For example if I want to store employee data in a table, with "IT" dept > > employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and > if > > employee belongs to other than these two, should come in emp_p3 > partition. > > > > In this case not sure how to create partition table. Do we have something > > like we have UNBOUNDED for range partition or oracle have "DEFAULT" for > > list partition. > > > > create table employee (empid int, dept varchar) partition by list(dept); > > create table emp_p1 partition of employee for values in ('IT'); > > create table emp_p2 partition of employee for values in ('HR'); > > create table emp_p3 partition of employee for values in (??); > > Sorry, no such feature is currently offered. It might be possible to > offer something like a "default" list partition which accepts values other > than those specified for other existing partitions. However, that means > if we add a non-default list partition after a default one has been > created, the implementation must make sure that it moves any values from > the default partition that now belong to the newly created partition. > > Thanks, > Amit > Thanks for clarifying, But I could see same problem of moving data when adding a new non-default partition with unbounded range partition. For example give here, Initially I have create a partition table test with test_p3 as unbounded end, Later tried to change test_p3 to contain 7-9 values only, and adding a new partition test_p4 contain 10-unbound. --create partition table and some leafs CREATE TABLE test (a int, b int) PARTITION BY RANGE(a); CREATE TABLE test_p1 PARTITION OF test FOR VALUES START (1) END (4); CREATE TABLE test_p2 PARTITION OF test FOR VALUES START (4) END (7); CREATE TABLE test_p3 PARTITION OF test FOR VALUES START (7) END UNBOUNDED; --insert some data INSERT INTO test SELECT i, i*10 FROM generate_series(1,3) i; INSERT INTO test SELECT i, i*10 FROM generate_series(4,6) i; INSERT INTO test SELECT i, i*10 FROM generate_series(7,13) i; --directly not able to attach test_p4 because of overlap error, hence detached test_p3 and than attaching test_p4 SELECT tableoid::regclass,* FROM test; tableoid | a | b --++- test_p1 | 1 | 10 test_p1 | 2 | 20 test_p1 | 3 | 30 test_p2 | 4 | 40 test_p2 | 5 | 50 test_p2 | 6 | 60 test_p3 | 7 | 70 test_p3 | 8 | 80 test_p3 | 9 | 90 test_p3 | 10 | 100 test_p3 | 11 | 110 test_p3 | 12 | 120 test_p3 | 13 | 130 (13 rows) ALTER TABLE test DETACH PARTITION test_p3; CREATE TABLE test_p4 (like test); ALTER TABLE test ATTACH PARTITION test_p4 FOR VALUES start (10) end UNBOUNDED; --now can not attach test_p3 because of overlap with test_p4, causing data loss from main test table. ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (10); ERROR: source table contains a row violating partition bound specification ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (13); ERROR: partition "test_p3" would overlap partition "test_p4" SELECT tableoid::regclass,* FROM test; tableoid | a | b --+---+ test_p1 | 1 | 10 test_p1 | 2 | 20 test_p1 | 3 | 30 test_p2 | 4 | 40 test_p2 | 5 | 50 test_p2 | 6 | 60 (6 rows)
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
On Thu, Sep 8, 2016 at 9:27 PM, Tom Lanewrote: > Michael Paquier writes: >> But I don't understand the reason behind such a restriction to be >> honest because libxml2 does not depend on libxslt. The contrary is >> true: libxslt needs libxml2. > > Right. > >> Note as well that libxml2 does depend on ICONV though. > > Hm, is that true either? I don't see any sign of such a dependency > on Linux: > > $ ldd /usr/lib64/libxml2.so.2.7.6 > linux-vdso.so.1 => (0x7fff98f6b000) > libdl.so.2 => /lib64/libdl.so.2 (0x00377a60) > libz.so.1 => /lib64/libz.so.1 (0x00377ae0) > libm.so.6 => /lib64/libm.so.6 (0x003779e0) > libc.so.6 => /lib64/libc.so.6 (0x003779a0) > /lib64/ld-linux-x86-64.so.2 (0x00377960) Peaking into the libxml2 code there is a configure switch --with-iconv, so that's an optional dependency. And the same exists for Windows in their win32/stuff. So I am mistaken and this could be just removed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sample configuration files
Vik Fearingwrites: > I noticed that this patch has been marked Waiting on Author with no > comment. Peter, what more should I be doing right now while waiting for > MartÃn's review? FWIW, I agree with the upthread misgivings about whether this is actually a useful effort. Even if we installed the sample config files somewhere (something there is not consensus for AFAICT), they would not actually *do* anything useful as standalone files. I suppose you are imagining that people would either manually concatenate them onto postgresql.conf or insert an include directive for them into postgresql.conf, but neither of those things sound pleasant or maintainable. Moreover, it's not clear why anyone would do that at all in the age of ALTER SYSTEM SET. I suggest that it'd be more fruitful to view this as a documentation effort; that is, in each contrib module's SGML documentation file provide a standardized section listing all its parameters and their default settings. That would be something that could be copied-and-pasted from into either an editor window on postgresql.conf for the old guard, or an ALTER SYSTEM SET command for the new. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
On Thu, Sep 8, 2016 at 2:53 PM, Michael Paquierwrote: > On Thu, Sep 8, 2016 at 9:39 PM, Magnus Hagander > wrote: > > Pretty sure this goes back to *our* XML support requiring both. As in you > > couldn't turn on/off xslt independently, so the "xml requires xslt" > comment > > is that *our* xml support required both. > > > > This may definitely not be true anymore, and that check has just not been > > updated. > > > > Also this was 10 years ago, so I'm of course not 100% sure, but I think > it > > was something like that... > > Was it for contrib/xml2? For example was it because this module could > not be compiled with just libxml2? > Yes, that's what I'm referring to. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
Michael Paquierwrites: > On Thu, Sep 8, 2016 at 9:39 PM, Magnus Hagander wrote: >> Pretty sure this goes back to *our* XML support requiring both. As in you >> couldn't turn on/off xslt independently, so the "xml requires xslt" comment >> is that *our* xml support required both. >> >> This may definitely not be true anymore, and that check has just not been >> updated. >> >> Also this was 10 years ago, so I'm of course not 100% sure, but I think it >> was something like that... > Was it for contrib/xml2? For example was it because this module could > not be compiled with just libxml2? The core code has never used xslt at all. Some quick digging in the git history suggests that contrib/xml2 wasn't very clean about this before 2008: commit eb915caf92a6805740e949c3233ee32bc9676484 Author: Tom Lane Date: Thu May 8 16:49:37 2008 + Fix contrib/xml2 makefile to not override CFLAGS, and in passing make it auto-configure properly for libxslt present or not. or even 2010, depending on how large a value of "work" you want: commit d6a6f8c6be4b6d6a9e90e92d91a83225bfe8d29d Author: Tom Lane Date: Mon Mar 1 18:07:59 2010 + Fix contrib/xml2 so regression test still works when it's built without libxslt. This involves modifying the module to have a stable ABI, that is, the xslt_process() function still exists even without libxslt. It throws a runtime error if called, but doesn't prevent executing the CREATE FUNCTION call. This is a good thing anyway to simplify cross-version upgrades. Both of those fixes postdate our native-Windows port, though I'm not sure of the origin of the specific test you're wondering about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
On Thu, Sep 8, 2016 at 9:39 PM, Magnus Haganderwrote: > Pretty sure this goes back to *our* XML support requiring both. As in you > couldn't turn on/off xslt independently, so the "xml requires xslt" comment > is that *our* xml support required both. > > This may definitely not be true anymore, and that check has just not been > updated. > > Also this was 10 years ago, so I'm of course not 100% sure, but I think it > was something like that... Was it for contrib/xml2? For example was it because this module could not be compiled with just libxml2? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Dilip Kumarwrites: > On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane wrote: >> In the case of record_in, it seems like it'd be easy enough to use >> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() >> and then throw a user-facing error right there. > Actually when we are passing invalid type to "record_in", error is > thrown in "lookup_type_cache" function, and this function is not > taking "no_error" input (it's only limited to > lookup_rowtype_tupdesc_internal). Ah, should have dug a bit deeper. > One option is to pass "no_error" parameter to "lookup_type_cache" > function also, and throw error only in record_in function. > But problem is, "lookup_type_cache" has lot of references. And also > will it be good idea to throw one generic error instead of multiple > specific errors. ? We could make it work like that without breaking the ABI if we were to add a NOERROR bit to the allowed "flags". However, after looking around a bit I'm no longer convinced what I said above is a good idea. In particular, if we put the responsibility of reporting the error on callers then we'll lose the ability to distinguish "no such pg_type OID" from "type exists but it's only a shell". So I'm now thinking it's okay to promote lookup_type_cache's elog to a full ereport, especially since it's already using ereport for some of the related error cases such as the shell-type failure. That still leaves us with what to do for domain_in. A really simple fix would be to move the InitDomainConstraintRef call before the getBaseTypeAndTypmod call, because the former fetches the typcache entry and would then benefit from lookup_type_cache's ereport. But that seems a bit magic. I'm tempted to propose that domain_state_setup start with typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO); if (typentry->typtype != TYPTYPE_DOMAIN) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s is not a domain", format_type_be(domainType; removing the need to verify that after getBaseTypeAndTypmod. The cache loading is basically free because InitDomainConstraintRef would do it anyway; so the extra cost here is only one dynahash search. You could imagine buying back those cycles by teaching the typcache to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful that we really care. This whole setup sequence only happens once per query anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
Thank you for comments. >But there's a function in startup.c which might be the ideal location >or the check, as it looks like the one and only place where the >autocommit flag actually changes: >static void >autocommit_hook(const char *newval) Thank you for pointing out this. This indeed is the only place where autocommit setting changes in psql. However, including the check here will require returning the status of command from this hook. i.e if we throw error inside this hook we will need to return false indicating the value has not changed. This will change the existing definition of the hook which returns void. This will also lead to changes in other implementations of this hook apart from autocommit_hook. >There are other values than ON: true/yes and theirs abbreviations, >the value 1, and anything that doesn't resolve to OFF is taken as ON. >ParseVariableBool() in commands.c already does the job of converting >these to bool, the new code should probably just call that function >instead of parsing the value itself. I have included this in the attached patch. Also I have included prior review comments by Rushabh Lathia and Amit Langote in the attached patch Regarding suggestion to move the code inside SetVariable(), I think it is not a good idea because SetVariable() and variable.c file in which it is present is about handling of psql variables, checks for correct variable names, values etc. This check is about correctness of the command so it is better placed in exec_command(). Thank you, Rahila Syed On Sat, Sep 3, 2016 at 4:39 PM, Daniel Veritewrote: > Rushabh Lathia wrote: > > > It might happen that SetVariable() can be called from other places in > > future, > > so its good to have all the set variable related checks inside > > SetVariable(). > > There's already another way to skip the \set check: > select 'on' as "AUTOCOMMIT" \gset > > But there's a function in startup.c which might be the ideal location > for the check, as it looks like the one and only place where the > autocommit flag actually changes: > > static void > autocommit_hook(const char *newval) > { > pset.autocommit = ParseVariableBool(newval, "AUTOCOMMIT"); > } > > > > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: http://www.manitou-mail.org > Twitter: @DanielVerite > psql_error_on_autocommit_v2.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
Hello, Victor. > I'm sending new version of patch. > > I've replaced readonly option with target_server_type (with JDBC > compatibility alias targetServerType), > > use logic of setting defaults based on number of hosts in the connect > string instead of complicated condition in the state machine, > > added checks for return values of memory allocation function. > > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq > along with some other libpgport objects which are already linked there. > > Thus client applications don't need to link with libpgport and libpq > shared library is self-containted. This patch doesn't apply to master branch: ``` $ git apply ~/temp/libpq-failover-8.patch /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace. check: /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace. /* /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace. * Validate target_server_mode option /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace. */ /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace. appendPQExpBuffer(>errorMessage, error: src/interfaces/libpq/t/001-multihost.pl: already exists in working directory $ git diff ``` -- Best regards, Aleksander Alekseev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
Hi, On 07/09/16 14:10, Erik Rijkers wrote: On 2016-08-31 22:51, Petr Jelinek wrote: and one more version with bug fixes, improved code docs and couple I am not able to get the replication to work. Would you (or anyone) be so kind to point out what I am doing wrong? Patches applied, compiled, make-checked, installed OK. I have 2 copies compiled and installed, logical_replication and logical_replication2, to be publisher and subscriber, ports 6972 and 6973 respectively. Logfile subscriber-side: [...] 2016-09-07 13:47:44.441 CEST 21997 LOG: MultiXact member wraparound protections are now enabled 2016-09-07 13:47:44.528 CEST 21986 LOG: database system is ready to accept connections 2016-09-07 13:47:44.529 CEST 22002 LOG: logical replication launcher started 2016-09-07 13:52:11.319 CEST 22143 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:47.010 CEST 22143 ERROR: could not open relation with OID 0 2016-09-07 13:53:47.012 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22143) exited with exit code 1 2016-09-07 13:53:47.018 CEST 22184 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:47.028 CEST 22184 ERROR: could not open relation with OID 0 2016-09-07 13:53:47.030 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22184) exited with exit code 1 2016-09-07 13:53:52.041 CEST 22187 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:52.045 CEST 22187 ERROR: could not open relation with OID 0 2016-09-07 13:53:52.046 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22187) exited with exit code 1 (repeat every few seconds) It means the tables don't exist on subscriber. I added check and proper error message in my local dev branch, it will be part of the next update. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freirewrote: > On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark wrote: > > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs > wrote: > >> On 6 September 2016 at 19:59, Tom Lane wrote: > >> > >>> The idea of looking to the stats to *guess* about how many tuples are > >>> removable doesn't seem bad at all. But imagining that that's going to > be > >>> exact is folly of the first magnitude. > >> > >> Yes. Bear in mind I had already referred to allowing +10% to be safe, > >> so I think we agree that a reasonably accurate, yet imprecise > >> calculation is possible in most cases. > > > > That would all be well and good if it weren't trivial to do what > > Robert suggested. This is just a large unsorted list that we need to > > iterate throught. Just allocate chunks of a few megabytes and when > > it's full allocate a new chunk and keep going. There's no need to get > > tricky with estimates and resizing and whatever. > > I agree. While the idea of estimating the right size sounds promising > a priori, considering the estimate can go wrong and over or > underallocate quite severely, the risks outweigh the benefits when you > consider the alternative of a dynamic allocation strategy. > > Unless the dynamic strategy has a bigger CPU impact than expected, I > believe it's a superior approach. > > How about a completely different representation for the TID array? Now this is probably not something new, but I couldn't find if the exact same idea was discussed before. I also think it's somewhat orthogonal to what we are trying to do here, and will probably be a bigger change. But I thought I'll mention since we are at the topic. What I propose is to use a simple bitmap to represent the tuples. If a tuple at is dead then the corresponding bit in the bitmap is set. So clearly the searches through dead tuples are O(1) operations, important for very large tables and large arrays. Challenge really is that a heap page can theoretically have MaxOffsetNumber of line pointers (or to be precise maximum possible offset number). For a 8K block, that comes be about 2048. Having so many bits per page is neither practical nor optimal. But in practice the largest offset on a heap page should not be significantly greater than MaxHeapTuplesPerPage, which is a more reasonable value of 291 on my machine. Again, that's with zero sized tuple and for real life large tables, with much wider tuples, the number may go down even further. So we cap the offsets represented in the bitmap to some realistic value, computed by looking at page density and then multiplying it by a small factor (not more than two) to take into account LP_DEAD and LP_REDIRECT line pointers. That should practically represent majority of the dead tuples in the table, but we then keep an overflow area to record tuples beyond the limit set for per page. The search routine will do a direct lookup for offsets less than the limit and search in the sorted overflow area for offsets beyond the limit. For example, for a table with 60 bytes wide tuple (including 24 byte header), each page can approximately have 8192/60 = 136 tuples. Say we provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the bitmap. First 272 offsets in every page are represented in the bitmap and anything greater than are in overflow region. On the other hand, the current representation will need about 16 bytes per page assuming 2% dead tuples, 40 bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead tuples. So bitmap will take more space for small tuples or when vacuum is run very aggressively, both seems unlikely for very large tables. Of course the calculation does not take into account the space needed by the overflow area, but I expect that too be small. I guess we can make a choice between two representations at the start looking at various table stats. We can also be smart and change from bitmap to traditional representation as we scan the table and see many more tuples in the overflow region than we provisioned for. There will be some challenges in converting representation mid-way, especially in terms of memory allocation, but I think those can be sorted out if we think that the idea has merit. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Truncating/vacuuming relations on full tablespaces
On Mon, Jun 20, 2016 at 9:28 PM, Asif Naeemwrote: > Thank you for useful suggestions. PFA patch, I have tried to cover all the > points mentioned. > >> >> Patch needs rebase, it is failing to apply on latest master. And also there are some pending comment fix from Robert. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Optimization for lazy_scan_heap
On 9/8/16 3:03 AM, Masahiko Sawada wrote: Current manual vacuum doesn't output how may all_frozen pages we skipped according to visibility map. That's why I attached 0001 patch which makes the manual vacuum emit such information. I think we should add that, and info about all-frozen skips, regardless of the rest of these changes. Can you submit that separately to the commitfest? (Or I can if you want.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
> Hello, Victor. > > > I'm sending new version of patch. > > > > I've replaced readonly option with target_server_type (with JDBC > > compatibility alias targetServerType), > > > > use logic of setting defaults based on number of hosts in the connect > > string instead of complicated condition in the state machine, > > > > added checks for return values of memory allocation function. > > > > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq > > along with some other libpgport objects which are already linked there. > > > > Thus client applications don't need to link with libpgport and libpq > > shared library is self-containted. > > This patch doesn't apply to master branch: > > ``` > $ git apply ~/temp/libpq-failover-8.patch > /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace. > check: > /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace. > /* > /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace. >* Validate target_server_mode option > /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace. >*/ > /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace. > appendPQExpBuffer(>errorMessage, > error: src/interfaces/libpq/t/001-multihost.pl: already exists in > working directory > > $ git diff > ``` Oops. Sorry, my mistake. I forgot to check for untracked files. Everything is fine. -- Best regards, Aleksander Alekseev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On 9/8/16 3:48 AM, Masahiko Sawada wrote: If we replaced dead_tuples with an array-of-array, isn't there negative performance impact for lazy_tid_reap()? As chunk is added, that performance would be decrease. Yes, it certainly would, as you'd have to do 2 binary searches. I'm not sure how much that matters though; presumably the index scans are normally IO-bound? Another option would be to use the size estimation ideas others have mentioned to create one array. If the estimates prove to be wrong you could then create a single additional segment; by that point you should have a better idea of how far off the original estimate was. That means the added search cost would only be a compare and a second pointer redirect. Something else that occurred to me... AFAIK the only reason we don't support syncscan with VACUUM is because it would require sorting the TID list. If we just added a second TID list we would be able to support syncscan, swapping over to the 'low' list when we hit the end of the relation. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] to_date_valid()
On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote: > postgres=# SELECT to_date('2011 12 18', ' MM DD'); >to_date > > 2011-12-08 > (1 row) > > > That is from the regression tests, and obviously handles the date > transformation wrong. My attempt catches this, because I compare the > date with the input date, and do not rely on a valid date only. It's debatable what is correct here. Using to_number, the behavior appears to be that a space in the pattern ignores one character. For example: test=# select to_number('123 456', '999 999'); to_number --- 123456 test=# select to_number('123 456', '999 999'); to_number --- 12356 Considering that, the above to_date result is not incorrect. So just squashing the spaces and converting the value back is not a correct approach to detecting overflow. I think using ValidateDate() was the right idea. That is what we use for checking date validity everywhere else. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
On Thu, Sep 8, 2016 at 4:03 AM, Masahiko Sawadawrote: > On Wed, Sep 7, 2016 at 4:11 PM, Simon Riggs wrote: >> On 7 September 2016 at 04:13, Masahiko Sawada wrote: >> >>> Since current HEAD could scan visibility map twice, the execution time >>> of Patched is approximately half of HEAD. >> >> Sounds good. >> >> To ensure we are doing exactly same amount of work as before, did you >> see the output of VACUUM VEROBOSE? > > Sorry, the previous test result I posted was something wrong. > I rerun the performance test and results are, > > * 1TB Table(visibility map size is 32MB) > HEAD : 4853.250 ms (00:04.853) > Patched : 3805.789 ms (00:03.806) > > * 8TB Table(visibility map size is 257MB) > HEAD : 37853.891 ms (00:37.854) > Patched : 30153.943 ms (00:30.154) > > * 32TB Table(visibility map size is 1GB) > HEAD: 151908.344 ms (02:31.908) > Patched: 120560.037 ms (02:00.560) > > Since visibility map page can be cached onto shared buffer or OS cache > by first scanning, the benefit of this patch seems not to be large. Yeah, that's not nearly as good as the first set of results. Maybe it's still worth doing, but if you need to have a 32TB table to save as much as 30 seconds, we don't have much of a problem here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
Peter Eisentrautwrites: > On 9/6/16 1:40 PM, Doug Doole wrote: >> We carried the ICU version numbers around on our collation and locale >> IDs (such as fr_FR%icu36) . The database would load multiple versions of >> the ICU library so that something created with ICU 3.6 would always be >> processed with ICU 3.6. This avoided the problems of trying to change >> the rules on the user. (We'd always intended to provide tooling to allow >> the user to move an existing object up to a newer version of ICU, but we >> never got around to doing it.) In the code, this meant we were >> explicitly calling the versioned API so that we could keep the calls >> straight. > I understand that in principle, but I don't see operating system > providers shipping a bunch of ICU versions to facilitate that. They > will usually ship one. I agree with that estimate, and I would further venture that even if we wanted to bundle ICU into our tarballs, distributors would rip it out again on security grounds. I am dead certain Red Hat would do so; less sure that other vendors have similar policies, but it seems likely. They don't want to have to fix security bugs in more than one place. This is a problem, if ICU won't guarantee cross-version compatibility, because it destroys the argument that moving to ICU would offer us collation behavior stability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Thu, Sep 8, 2016 at 11:54 AM, Pavan Deolaseewrote: > For example, for a table with 60 bytes wide tuple (including 24 byte > header), each page can approximately have 8192/60 = 136 tuples. Say we > provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the > bitmap. First 272 offsets in every page are represented in the bitmap and > anything greater than are in overflow region. On the other hand, the current > representation will need about 16 bytes per page assuming 2% dead tuples, 40 > bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead > tuples. So bitmap will take more space for small tuples or when vacuum is > run very aggressively, both seems unlikely for very large tables. Of course > the calculation does not take into account the space needed by the overflow > area, but I expect that too be small. I thought about something like this, but it could be extremely inefficient for mostly frozen tables, since the bitmap cannot account for frozen pages without losing the O(1) lookup characteristic -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
On Thu, Sep 8, 2016 at 11:27 PM, Jim Nasbywrote: > On 9/8/16 3:03 AM, Masahiko Sawada wrote: >> >> Current manual vacuum doesn't output how may all_frozen pages we >> skipped according to visibility map. >> That's why I attached 0001 patch which makes the manual vacuum emit >> such information. > > > I think we should add that, and info about all-frozen skips, regardless of > the rest of these changes. Can you submit that separately to the commitfest? > (Or I can if you want.) Yeah, autovacuum emits such information but manual vacuum doesn't. I submitted that patch before[1] but patch was not committed, because verbose output is already complicated. But I will submit it again and start to discussion about that. [1] https://www.postgresql.org/message-id/CAD21AoCQGTKDxQ6YfrJ0%2B%3Dev6A3i3pt2ULdWxCtwPLKR2E77jg%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CF app and patch series
Craig Ringer wrote: > Hi all > > Now that it's becoming more common to post patch series, not just > standalone patches, it might be worth looking at how the CF app can > help manage them. > > Any ideas? I agree that we don't consider this case at all currently and that it's causing some pain. MIME parsing is mind-boggling. Trying to figure out *one* patch file from an email is already pretty difficult. Trying to figure out more than one might be nightmarish. Maybe Magnus will contradict me and say it's trivial to do -- that'd be great. I don't have any great ideas for how to support this; I'd say it would be something like the CF entry has sub-entries one for each patch in the series, that can be closed independently. This probably involves surgery to the CF database and app which I'm not volunteering to write, however. Django-enabled contributors speak up now ... I think for the time being we should keep the single entry as "needs review" or whatever open state until there is nothing left in committable state, then close as "committed"; the parts that were not committed can be resent as a new series to a later CF. In the annotations section, add a link to the latest version each time it's posted, along with some indication of how many parts are left[1]. Perhaps if a part is committed and there's no new version submitted soon, also add an annotation to indicate that. [1] Do not, as I've seen some do in the past, delete the old annotations while adding new ones. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Sep 6, 2016 at 8:28 PM, Peter Geogheganwrote: >> In the meanwhile, I'll go and do some perf testing. >> >> Assuming the speedup is realized during testing, LGTM. > > Thanks. I suggest spending at least as much time on unsympathetic > cases (e.g., only 2 or 3 tapes must be merged). At the same time, I > suggest focusing on a type that has relatively expensive comparisons, > such as collated text, to make differences clearer. The tests are still running (the benchmark script I came up with runs for a lot longer than I anticipated, about 2 days), but preliminar results are very promising, I can see a clear and consistent speedup. We'll have to wait for the complete results to see if there's any significant regression, though. I'll post the full results when I have them, but until now it all looks like this: setup: create table lotsofitext(i text, j text, w text, z integer, z2 bigint); insert into lotsofitext select cast(random() * 10.0 as text) || 'blablablawblabla', cast(random() * 10.0 as text) || 'blablablawjjjblabla', cast(random() * 10.0 as text) || 'blablabl awwwabla', random() * 10.0, random() * 1.0 from generate_series(1, 1000); timed: select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t; Unpatched Time: 100351.251 ms Patched Time: 75180.787 ms That's like a 25% speedup on random input. As we say over here, rather badly translated, not a turkey's boogers (meaning "nice!") On Tue, Sep 6, 2016 at 9:50 PM, Claudio Freire wrote: > On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan wrote: >> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire >> wrote: >>> I noticed, but here n = state->memtupcount >>> >>> + Assert(memtuples[0].tupindex == newtup->tupindex); >>> + >>> + CHECK_FOR_INTERRUPTS(); >>> + >>> + n = state->memtupcount; /* n is heap's size, >>> including old root */ >>> + imin = 0; /* >>> start with caller's "hole" in root */ >>> + i = imin; >> >> I'm fine with using "n" in the later assertion you mentioned, if >> that's clearer to you. memtupcount is broken out as "n" simply because >> that's less verbose, in a place where that makes things far clearer. >> >>> In fact, the assert on the patch would allow writing memtuples outside >>> the heap, as in calling tuplesort_heap_root_displace if >>> memtupcount==0, but I don't think that should be legal (memtuples[0] >>> == memtuples[imin] would be outside the heap). >> >> You have to have a valid heap (i.e. there must be at least one >> element) to call tuplesort_heap_root_displace(), and it doesn't >> directly compact the heap, so it must remain valid on return. The >> assertion exists to make sure that everything is okay with a >> one-element heap, a case which is quite possible. > > More than using "n" or "memtupcount" what I'm saying is to assert that > memtuples[imin] is inside the heap, which would catch the same errors > the original assert would, and more. > > Assert(imin < state->memtupcount) > > If you prefer. > > The original asserts allows any value of imin for memtupcount>1, and > that's my main concern. It shouldn't. So, for the assertions to properly avoid clobbering/reading out of bounds memory, you need both the above assert: +*/ + memtuples[i] = memtuples[imin]; + i = imin; + } + >+ Assert(imin < state->memtupcount); + memtuples[imin] = *newtup; +} And another one at the beginning, asserting: + SortTuple *memtuples = state->memtuples; + int n, + imin, + i; + >+ Assert(state->memtupcount > 0 && memtuples[0].tupindex == >newtup->tupindex); + + CHECK_FOR_INTERRUPTS(); It's worth making that change, IMHO, unless I'm missing something. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH v2] Add overflow checks to money type input function
I have updated the patch with additional tests and comments per your review. Final(?) version attached. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ee34d7d64a4b10c9f7fbe8c905a56cea1584c8c9 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Thu, 8 Sep 2016 12:00:00 -0400 Subject: [PATCH v3] Add overflow checks to money type input function The money type input function did not have any overflow checks at all. There were some regression tests that purported to check for overflow, but they actually checked for the overflow behavior of the int8 type before casting to money. Remove those unnecessary checks and add some that actually check the money input function. Reviewed-by: Fabien COELHO --- src/backend/utils/adt/cash.c| 53 ++-- src/test/regress/expected/money.out | 98 ++--- src/test/regress/sql/money.sql | 30 ++-- 3 files changed, 165 insertions(+), 16 deletions(-) diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index b336185..a146b0a 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -189,13 +189,30 @@ cash_in(PG_FUNCTION_ARGS) printf("cashin- string is '%s'\n", s); #endif + /* + * We accumulate the absolute amount in "value" and then apply the sign at + * the end. (The sign can appear before or after the digits, so it would + * be more complicated to do otherwise.) Because of the larger range of + * negative signed integers, we build "value" in the negative and then + * flip the sign at the end, catching most-negative-number overflow if + * necessary. + */ + for (; *s; s++) { /* we look for digits as long as we have found less */ /* than the required number of decimal places */ if (isdigit((unsigned char) *s) && (!seen_dot || dec < fpoint)) { - value = (value * 10) + (*s - '0'); + Cash newvalue = (value * 10) - (*s - '0'); + + if (newvalue / 10 != value) +ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", +str))); + + value = newvalue; if (seen_dot) dec++; @@ -214,11 +231,27 @@ cash_in(PG_FUNCTION_ARGS) /* round off if there's another digit */ if (isdigit((unsigned char) *s) && *s >= '5') - value++; + value--; /* remember we build the value in the negative */ + + if (value > 0) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", + str))); /* adjust for less than required decimal places */ for (; dec < fpoint; dec++) - value *= 10; + { + Cash newvalue = value * 10; + + if (newvalue / 10 != value) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", + str))); + + value = newvalue; + } /* * should only be trailing digits followed by whitespace, right paren, @@ -247,7 +280,19 @@ cash_in(PG_FUNCTION_ARGS) str))); } - result = value * sgn; + /* If the value is supposed to be positive, flip the sign, but check for + * the most negative number. */ + if (sgn > 0) + { + result = -value; + if (result < 0) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", + str))); + } + else + result = value; #ifdef CASHDEBUG printf("cashin- result is " INT64_FORMAT "\n", result); diff --git a/src/test/regress/expected/money.out b/src/test/regress/expected/money.out index 538235c..5695f87 100644 --- a/src/test/regress/expected/money.out +++ b/src/test/regress/expected/money.out @@ -185,6 +185,96 @@ SELECT * FROM money_data; $123.46 (1 row) +-- input checks +SELECT '1234567890'::money; + money +--- + $1,234,567,890.00 +(1 row) + +SELECT '12345678901234567'::money; + money + + $12,345,678,901,234,567.00 +(1 row) + +SELECT '123456789012345678'::money; +ERROR: value "123456789012345678" is out of range for type money +LINE 1: SELECT '123456789012345678'::money; + ^ +SELECT '9223372036854775807'::money; +ERROR: value "9223372036854775807" is out of range for type money +LINE 1: SELECT '9223372036854775807'::money; + ^ +SELECT '-12345'::money; +money +- + -$12,345.00 +(1 row) + +SELECT '-1234567890'::money; + money + + -$1,234,567,890.00 +(1 row) + +SELECT '-12345678901234567'::money; +money +- + -$12,345,678,901,234,567.00 +(1 row) + +SELECT '-123456789012345678'::money; +ERROR: value "-123456789012345678" is out of range for type money +LINE 1: SELECT '-123456789012345678'::money; + ^ +SELECT
Re: [HACKERS] Forbid use of LF and CR characters in database and role names
On 9/6/16 1:42 PM, Robert Haas wrote: > If we were talking about pathnames containing spaces, I would agree, > but I've never heard of a legitimate pathname containing CR or LF. I > can't see us losing much by refusing to allow such pathnames, except > for security holes. The flip side of that is that if we're doing a half-way job of only prohibiting these characters in 67% of cases, then a new generation of tools will be written on top of that with the assumption that these characters cannot appear. But then those tools will be easy to break or exploit because it's possible to sneak stuff in in creative ways. So we're on the road to having an endless stream of "I can sneak in a CR/LF character in here" bugs. The current setup is more robust: We are prohibiting these characters in specific locations where we know we can't handle them. But we don't give any guarantees about anything else. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] feature request: explain "with details" option
My apologies if this was already requested before... I think it would be fantastic if postgres had an "explain the explain" option: Today's explain tells us what loops and scans were used, and relative costs, etc. It doesn't seem to tell *why* the planner elected to use what it did. For instance, in the case of a corrupted index, it doesn't say why it's not using that index, it just doesn't use it, causing some confusion to end users. At least causing confusion to me. Or in the case of where it iterates over an entire table (seq. scan) instead of using an index because the index range specified "is most of the table" (thus not helpful to use the index)...The choice is appropriate. The reasoning why is not explicitly mentioned. Again causing possibility for some delay as you try to "decipher the mind" of the planner. Sometimes tables (ex: tables after having been first propagated) need an "analyze" run on them, but it's not clear from an "explain" output that the analyze statistics are faulty. Not even a hint. So this is a feature request for an "EXPLAIN DETAILS" option or something, basically like today's explain but with more "rationale" included. This could be immensely useful to many Postgres users. I'd even be willing to chip in a couple hundred bucks if it would help grease the wheels for somebody taking up the challenge if that helps at all :) Thank you for your consideration in this regard. -roger- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
Peter Geogheganwrites: > On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas wrote: >> I still think tuplesort_heap_siftup is a confusing name, although I'm not >> sure that Peter's "compact" is much better. I suggest that we rename it to >> tuplesort_heap_delete_top(). In comments within the function, explain that >> the *loop* corresponds to the "siftup" in Knuth's book. > I'm also fine with that name. I can live with it too. >> Interestingly, as Knuth explains the siftup algorithm, it performs a >> "replace the top" operation, rather than "remove the top". But we use it to >> remove the top, by moving the last element to the top and running the >> algorithm after that. Peter's other patch, to change the way we currently >> replace the top node, to do it in one pass rather than as delete+insert, is >> actually Knuth's siftup algorithm. > Knuth must have a strange sense of humor. I'm too lazy to get the book back down off the shelf to check, but I think that Knuth uses sift-up to describe two different algorithms that have essentially the same inner loop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default make target in test modules
Tom Lane wrote: > Alvaro Herrerawrites: > > I suppose this is a very easy mistake to make -- but also fortunately an > > easy one to correct. Do you want me to fix the affected modules? > > I was going to do it, but if you want to, feel free. Done. > (BTW, I notice that test_pg_dump's Makefile misquotes its own path name, > probably copy-paste sloppiness.) Uh, I hadn't noticed. Fixed that one too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Thu, Sep 8, 2016 at 11:54 PM, Pavan Deolaseewrote: > > > On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freire > wrote: >> >> On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark wrote: >> > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs >> > wrote: >> >> On 6 September 2016 at 19:59, Tom Lane wrote: >> >> >> >>> The idea of looking to the stats to *guess* about how many tuples are >> >>> removable doesn't seem bad at all. But imagining that that's going to >> >>> be >> >>> exact is folly of the first magnitude. >> >> >> >> Yes. Bear in mind I had already referred to allowing +10% to be safe, >> >> so I think we agree that a reasonably accurate, yet imprecise >> >> calculation is possible in most cases. >> > >> > That would all be well and good if it weren't trivial to do what >> > Robert suggested. This is just a large unsorted list that we need to >> > iterate throught. Just allocate chunks of a few megabytes and when >> > it's full allocate a new chunk and keep going. There's no need to get >> > tricky with estimates and resizing and whatever. >> >> I agree. While the idea of estimating the right size sounds promising >> a priori, considering the estimate can go wrong and over or >> underallocate quite severely, the risks outweigh the benefits when you >> consider the alternative of a dynamic allocation strategy. >> >> Unless the dynamic strategy has a bigger CPU impact than expected, I >> believe it's a superior approach. >> > > How about a completely different representation for the TID array? Now this > is probably not something new, but I couldn't find if the exact same idea > was discussed before. I also think it's somewhat orthogonal to what we are > trying to do here, and will probably be a bigger change. But I thought I'll > mention since we are at the topic. > > What I propose is to use a simple bitmap to represent the tuples. If a tuple > at is dead then the corresponding bit in the bitmap is set. > So clearly the searches through dead tuples are O(1) operations, important > for very large tables and large arrays. > > Challenge really is that a heap page can theoretically have MaxOffsetNumber > of line pointers (or to be precise maximum possible offset number). For a 8K > block, that comes be about 2048. Having so many bits per page is neither > practical nor optimal. But in practice the largest offset on a heap page > should not be significantly greater than MaxHeapTuplesPerPage, which is a > more reasonable value of 291 on my machine. Again, that's with zero sized > tuple and for real life large tables, with much wider tuples, the number may > go down even further. > > So we cap the offsets represented in the bitmap to some realistic value, > computed by looking at page density and then multiplying it by a small > factor (not more than two) to take into account LP_DEAD and LP_REDIRECT line > pointers. That should practically represent majority of the dead tuples in > the table, but we then keep an overflow area to record tuples beyond the > limit set for per page. The search routine will do a direct lookup for > offsets less than the limit and search in the sorted overflow area for > offsets beyond the limit. > > For example, for a table with 60 bytes wide tuple (including 24 byte > header), each page can approximately have 8192/60 = 136 tuples. Say we > provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the > bitmap. First 272 offsets in every page are represented in the bitmap and > anything greater than are in overflow region. On the other hand, the current > representation will need about 16 bytes per page assuming 2% dead tuples, 40 > bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead > tuples. So bitmap will take more space for small tuples or when vacuum is > run very aggressively, both seems unlikely for very large tables. Of course > the calculation does not take into account the space needed by the overflow > area, but I expect that too be small. > > I guess we can make a choice between two representations at the start > looking at various table stats. We can also be smart and change from bitmap > to traditional representation as we scan the table and see many more tuples > in the overflow region than we provisioned for. There will be some > challenges in converting representation mid-way, especially in terms of > memory allocation, but I think those can be sorted out if we think that the > idea has merit. > Making the vacuum possible to choose between two data representations sounds good. I implemented the patch that changes dead tuple representation to bitmap before. I will measure the performance of bitmap representation again and post them. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] Default make target in test modules
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I happened to notice that if you type "make" in > src/test/modules/test_pg_dump, you will get a "make check" action > not "make all". I hope this is just somebody being thoughtless > about Makefile ordering and not an intentional override of the > default make target. If the latter, I beg to differ about it > being a good idea. No, it wasn't intentional. > > Done. > > Thanks. Thanks to you both! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Add support for restrictive RLS policies
Greetings! * Stephen Frost (sfr...@snowman.net) wrote: > Based on Robert's suggestion and using Thom's verbiage, I've tested this > out: > > CREATE POLICY pol ON tab AS [PERMISSIVE|RESTRICTIVE] ... > > and it appears to work fine with the grammar, etc. > > Unless there's other thoughts on this, I'll update the patch to reflect > this grammar in a couple days. Updated patch attached which uses the above approach, includes some initial documentation, and has fixes for the tab completion, I'm planning to add more documentation. Otherwise, testing and code review would certainly be appreciated. Thanks! Stpehen From 6fad316de3cc50f4cc2c3bbe3c6fac91afc881e6 Mon Sep 17 00:00:00 2001 From: Stephen FrostDate: Thu, 1 Sep 2016 02:11:30 -0400 Subject: [PATCH] Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. --- doc/src/sgml/ref/create_policy.sgml | 16 +++ src/backend/commands/policy.c | 9 ++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/parser/gram.y | 32 +++-- src/backend/rewrite/rowsecurity.c | 7 +- src/bin/pg_dump/pg_dump.c | 38 -- src/bin/pg_dump/pg_dump.h | 1 + src/bin/pg_dump/t/002_pg_dump.pl | 39 +- src/bin/psql/describe.c | 109 src/bin/psql/tab-complete.c | 29 - src/include/catalog/pg_policy.h | 16 ++- src/include/nodes/parsenodes.h| 1 + src/include/parser/kwlist.h | 2 + src/include/rewrite/rowsecurity.h | 1 + src/test/regress/expected/rowsecurity.out | 209 ++ src/test/regress/sql/rowsecurity.sql | 24 +++- 17 files changed, 417 insertions(+), 118 deletions(-) diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml index 89d2787..d930052 100644 --- a/doc/src/sgml/ref/create_policy.sgml +++ b/doc/src/sgml/ref/create_policy.sgml @@ -22,6 +22,7 @@ PostgreSQL documentation CREATE POLICY name ON table_name +[ AS ( PERMISSIVE | RESTRICTIVE ) ] [ FOR { ALL | SELECT | INSERT | UPDATE | DELETE } ] [ TO { role_name | PUBLIC | CURRENT_USER | SESSION_USER } [, ...] ] [ USING ( using_expression ) ] @@ -120,6 +121,21 @@ CREATE POLICY name ON +permissive + + + If the policy is a "permissive" or "restrictive" policy. Permissive + policies are the default and always add visibillity- they ar "OR"d + together to allow the user access to all rows through any of the + permissive policies they have access to. Alternativly, a policy can + instead by "restrictive", meaning that the policy will be "AND"d + with other restrictive policies and with the result of all of the + permissive policies on the table. + + + + + command diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index d694cf8..70e22c1 100644 --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -235,6 +235,7 @@ RelationBuildRowSecurity(Relation relation) { Datum value_datum; char cmd_value; + bool permissive_value; Datum roles_datum; char *qual_value; Expr *qual_expr; @@ -257,6 +258,12 @@ RelationBuildRowSecurity(Relation relation) Assert(!isnull); cmd_value = DatumGetChar(value_datum); + /* Get policy permissive or restrictive */ + value_datum = heap_getattr(tuple, Anum_pg_policy_polpermissive, + RelationGetDescr(catalog), ); + Assert(!isnull); + permissive_value = DatumGetBool(value_datum); + /* Get policy name */ value_datum = heap_getattr(tuple, Anum_pg_policy_polname, RelationGetDescr(catalog), ); @@ -298,6 +305,7 @@ RelationBuildRowSecurity(Relation relation) policy = palloc0(sizeof(RowSecurityPolicy)); policy->policy_name = pstrdup(policy_name_value); policy->polcmd = cmd_value; + policy->permissive = permissive_value; policy->roles = DatumGetArrayTypePCopy(roles_datum); policy->qual = copyObject(qual_expr); policy->with_check_qual = copyObject(with_check_qual); @@ -796,6 +804,7 @@ CreatePolicy(CreatePolicyStmt *stmt) values[Anum_pg_policy_polname - 1] = DirectFunctionCall1(namein, CStringGetDatum(stmt->policy_name)); values[Anum_pg_policy_polcmd - 1] = CharGetDatum(polcmd); + values[Anum_pg_policy_polpermissive - 1] = BoolGetDatum(stmt->permissive); values[Anum_pg_policy_polroles - 1] = PointerGetDatum(role_ids); /* Add qual if present. */ diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index
[HACKERS] Default make target in test modules
I happened to notice that if you type "make" in src/test/modules/test_pg_dump, you will get a "make check" action not "make all". I hope this is just somebody being thoughtless about Makefile ordering and not an intentional override of the default make target. If the latter, I beg to differ about it being a good idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Sep 8, 2016 at 8:53 AM, Claudio Freirewrote: > setup: > > create table lotsofitext(i text, j text, w text, z integer, z2 bigint); > insert into lotsofitext select cast(random() * 10.0 as text) > || 'blablablawblabla', cast(random() * 10.0 as text) || > 'blablablawjjjblabla', cast(random() * 10.0 as text) || > 'blablabl > awwwabla', random() * 10.0, random() * 1.0 from > generate_series(1, 1000); > > timed: > > select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t; > > Unpatched Time: 100351.251 ms > Patched Time: 75180.787 ms > > That's like a 25% speedup on random input. As we say over here, rather > badly translated, not a turkey's boogers (meaning "nice!") Cool! What work_mem setting were you using here? >> More than using "n" or "memtupcount" what I'm saying is to assert that >> memtuples[imin] is inside the heap, which would catch the same errors >> the original assert would, and more. >> >> Assert(imin < state->memtupcount) >> >> If you prefer. >> >> The original asserts allows any value of imin for memtupcount>1, and >> that's my main concern. It shouldn't. > > So, for the assertions to properly avoid clobbering/reading out of > bounds memory, you need both the above assert: > > +*/ > + memtuples[i] = memtuples[imin]; > + i = imin; > + } > + >>+ Assert(imin < state->memtupcount); > + memtuples[imin] = *newtup; > +} > > And another one at the beginning, asserting: > > + SortTuple *memtuples = state->memtuples; > + int n, > + imin, > + i; > + >>+ Assert(state->memtupcount > 0 && memtuples[0].tupindex == >>newtup->tupindex); > + > + CHECK_FOR_INTERRUPTS(); > > It's worth making that change, IMHO, unless I'm missing something. You're supposed to just not call it with an empty heap, so the assertions trust that much. I'll look into that. Currently, producing a new revision of this entire patchset. Improving the cost model (used when the parallel_workers storage parameter is not specified within CREATE INDEX) is taking a bit of time, but hope to have it out in the next couple of days. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Make initdb's suggested "pg_ctl start" command line more reliabl
On 9/6/16 1:08 PM, Tom Lane wrote: >> As just mentioned elsewhere, this accidentally introduces a failure if >> > the PostgreSQL installation path contains LF/CR, because of the use of >> > appendShellString(). > I think that's intentional, not accidental. What actual use case is > there for allowing such paths? There probably isn't one. But we ought to be introducing this change in a more intentional and consistent way. For example, pg_basebackup has no such restriction. So using pg_basebackup, then promote, then pg_upgrade will (probably) fail now for some paths. More generally, I'm concerned that appendShellString() looks pretty attractive for future use. It's not inconceivable that someone will want to use it for say calling pg_dump from pg_dumpall or pg_upgrade at some point, and then maybe we'll accidentally disallow LF/CR in tablespace names, say. Also, if we're concerned about the potential for confusion that these characters can cause, maybe we should be disallowing more control characters in similar places. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Thu, Sep 8, 2016 at 8:42 PM, Claudio Freirewrote: > On Thu, Sep 8, 2016 at 11:54 AM, Pavan Deolasee > wrote: > > For example, for a table with 60 bytes wide tuple (including 24 byte > > header), each page can approximately have 8192/60 = 136 tuples. Say we > > provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the > > bitmap. First 272 offsets in every page are represented in the bitmap and > > anything greater than are in overflow region. On the other hand, the > current > > representation will need about 16 bytes per page assuming 2% dead > tuples, 40 > > bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead > > tuples. So bitmap will take more space for small tuples or when vacuum is > > run very aggressively, both seems unlikely for very large tables. Of > course > > the calculation does not take into account the space needed by the > overflow > > area, but I expect that too be small. > > I thought about something like this, but it could be extremely > inefficient for mostly frozen tables, since the bitmap cannot account > for frozen pages without losing the O(1) lookup characteristic > Well, that's correct. But I thought the whole point is when there are large number of dead tuples which requires large memory. If my math was correct as explained above, then even at 5% dead tuples, bitmap representation will consume approximate same memory but provide O(1) search time. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Sep 8, 2016 at 10:18 AM, Claudio Freirewrote: > That particular case I believe is using work_mem=4MB, easy strings, 1.5GB > table. Cool. I wonder where this leaves Heikki's draft patch, that completely removes batch memory, etc. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove superuser() checks from pgstattuple
On 9/4/16 7:36 PM, Stephen Frost wrote: > Perhaps if the versioned install script was actually a non-versioned > install script in the source tree, and the Makefile simply installed it > into the correct place, then we could eliminate that part. (All very > hand-wavy, I've not looked at what it'd take to do.) A number of external extensions actually do it that way, but it also has its problems. For example, if you don't preserve the old base install scripts, you can't actually test whether your upgrade scripts work in the sense that they upgrade you to the same place. This is now being obsoleted by the later idea of allowing base installs from a chain of upgrade scripts. But if your upgrade scripts contain ALTER TABLE commands, you will probably still want to write base install scripts that do a fresh CREATE TABLE instead. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DISABLE_PAGE_SKIPPING option for vacuumdb
Hi, Attached patch add --disable-page-skipping option to vacuumed command for 9.6. If by any chance the freeze map causes the serious data corruption bug then the user will want to run pg_check_visible() and vacuum with DISABLE_PAGE_SKIPPING option to the multiple tables having problem. In this case, I think that this option for vacuumdb would be convenient. Please give me feedback. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center disable_page_skipping_option_for_vacuumdb_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server
> While checking for shippability, we build the target list which is passed > to > the foreign server as fdw_scan_tlist. The target list contains > a. All the GROUP BY expressions > b. Shippable entries from the target list of upper relation > c. Var and Aggref nodes from non-shippable entries from the target list of > upper relation > The code in the patch doesn't seem to add Var nodes explicitly. It assumes that the Var nodes will be part of GROUP BY clause. The code is correct, I think. > d. Var and Aggref nodes from non-shippable HAVING conditions. > This needs to be changed as per the comments below. > > Costing: > > If use_foreign_estimate is true for input relation, like JOIN case, we use > EXPLAIN output to get the cost of query with aggregation/grouping on the > foreign server. If not we calculate the costs locally. Similar to core, we > use > get_agg_clause_costs() to get costs for aggregation and then using logic > similar to cost_agg() we calculate startup and total cost. Since we have no > idea which aggregation strategy will be used at foreign side, we add all > startup cost (startup cost of input relation, aggregates etc.) into startup > cost for the grouping path and similarly for total cost. > This looks OK to me. > > Deparsing the query: > > Target list created while checking for shippability is deparsed using > deparseExplicitTargetList(). sortgroupref are adjusted according to this > target list. Most of the logic to deparse an Aggref is inspired from > get_agg_expr(). For an upper relation, FROM and WHERE clauses come from the > underlying scan relation and thus for simplicity, FROM clause deparsing > logic > is moved from deparseSelectSql() to a new function deparseFromClause(). The > same function adds WHERE clause to the remote SQL. > Ok. > > > Area of future work: > > 1. Adding path with path-keys to push ORDER BY clause along with > aggregation/ > grouping. Should be supported as a separate patch. > > 2. Grouping Sets/Rollup/Cube is not supported in current version. I have > left > this aside to keep patch smaller. If required I can add that support in the > next version of the patch. > I am fine with these limitations for first cut of this feature. I think we should try to measure performance gain because of aggregate pushdown. The EXPLAIN doesn't show actual improvement in the execution times. Here are the comments on the patch. Patch compiles without errors/warnings - Yes make check passes - Yes. make check in contrib/postgres_fdw passes - Yes Attached patch (based on your patch) has some typos corrected, some comments rephrased. It also has some code changes, as explained in various comments below. Please see if those look good. 1. Usually for any deparseSomething function, Something is the type of node produced by the parser when the string output by that function is parsed. deparseColumnRef, for example, produces a string, which when parsed produces a columnRef node. There is are nodes of type FromClause and AggOrderBy. I guess, you meant FromExpr instead of FromClause. deparseAggOrderBy() may be renamed as appendOrderByFromList() or something similar. It may be utilized for window functions if required. 2. Can we infer the value of foragg flag from the RelOptInfo passed to is_foreign_expr()? For any upper relation, it is ok to have aggregate in there. For any other relation aggregate is not expected. 3. In function foreign_grouping_ok(), should we use classifyConditions()? The function is written and used for base relations. There's nothing in that function, which prohibits it being used for other relations. I feel that foreign_join_ok() should have used the same function to classify the other clauses. 4. May be the individual criteria in the comment block +/* + * Aggregate is safe to pushdown if + * 1. It is a built-in aggregate + * 2. All its arguments are safe to push-down + * 3. The functions involved are immutable. + * 4. Other expressions involved like aggorder, aggdistinct are + *safe to be pushed down. + */ should be associated with the code blocks which implement those. As the criteria change it's difficult to maintain the numbered list in sync with the code. 5. The comment +/* Aggregates other than simple one are non-pushable. */ should read /* Only non-split aggregates are pushable. */ as AGGSPLIT_SIMPLE means a complete, non-split aggregation step. 6. An aggregate function has transition, combination and finalization function associated with it. Should we check whether all of the functions are shippable? But probably it suffices to check whether aggregate function as whole is shippable or not using is_shippable() since it's the whole aggregate we are interested in and not the intermediate results. Probably, we should add a comment explaining why it's sufficient to check the aggregate function
Re: [HACKERS] Default make target in test modules
Alvaro Herrerawrites: > Tom Lane wrote: >> Alvaro Herrera writes: >>> I suppose this is a very easy mistake to make -- but also fortunately an >>> easy one to correct. Do you want me to fix the affected modules? >> I was going to do it, but if you want to, feel free. > Done. Thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache Hash Index meta page.
On 09/05/2016 02:50 PM, Mithun Cy wrote: On Sep 2, 2016 7:38 PM, "Jesper Pedersen"wrote: Could you provide a rebased patch based on Amit's v5 ? Please find the the patch, based on Amit's V5. I have fixed following things 1. now in "_hash_first" we check if (opaque->hasho_prevblkno == InvalidBlockNumber) to see if bucket is from older version hashindex and has been upgraded. Since as of now InvalidBlockNumber is one value greater than maximum value the variable "metap->hashm_maxbucket" can be set (see _hash_expandtable). We can distinguish it from rest. I tested the upgrade issue reported by amit. It is fixed now. 2. One case which buckets hasho_prevblkno is used is where we do backward scan. So now before testing for previous block number I test whether current page is bucket page if so we end the bucket scan (see changes in _hash_readprev). On other places where hasho_prevblkno is used it is not for bucket page, so I have not put any extra check to verify if is a bucket page. I think that the + pageopaque->hasho_prevblkno = metap->hashm_maxbucket; trick should be documented in the README, as hashm_maxbucket is defined as uint32 where as hasho_prevblkno is a BlockNumber. (All bucket variables should probably use the Bucket definition instead of uint32). For the archives, this patch conflicts with the WAL patch [1]. [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
Rahila Syed wrote: > However, including the check here will require returning the status > of command from this hook. i.e if we throw error inside this > hook we will need to return false indicating the value has not changed. Looking at the other variables hooks, they already emit errors and can deny the effect of a change corresponding to a new value, without informing the caller. Why would autocommit be different? For example echo_hook does this: /* ...if the value is in (queries,errors,all,none) then assign pset.echo accordingly ... */ else { psql_error("unrecognized value \"%s\" for \"%s\"; assuming \"%s\"\n", newval, "ECHO", "none"); pset.echo = PSQL_ECHO_NONE; } If the user issues \set ECHO FOOBAR, then it produces the above error message and makes the same internal change as if \set ECHO none had been issued. But, the value of the variable as seen by the user is still FOOBAR: \set [...other variables...] ECHO = 'FOOBAR' The proposed patch doesn't follow that behavior, as it denies a new value for AUTOCOMMIT. You might argue that it's better, but on the other side, there are two reasons against it: - it's not consistent with the other uses of \set that affect psql, such as ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE... and even AUTOCOMMIT as \set AUTOCOMMIT FOOBAR is not denied, just reinterpreted. - it doesn't address the case of another method than \set being used to set the variable, as in : SELECT 'on' as "AUTOCOMMIT" \gset whereas the hook would work in that case. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangaswrote: > I still think tuplesort_heap_siftup is a confusing name, although I'm not > sure that Peter's "compact" is much better. I suggest that we rename it to > tuplesort_heap_delete_top(). In comments within the function, explain that > the *loop* corresponds to the "siftup" in Knuth's book. I'm also fine with that name. > Interestingly, as Knuth explains the siftup algorithm, it performs a > "replace the top" operation, rather than "remove the top". But we use it to > remove the top, by moving the last element to the top and running the > algorithm after that. Peter's other patch, to change the way we currently > replace the top node, to do it in one pass rather than as delete+insert, is > actually Knuth's siftup algorithm. Knuth must have a strange sense of humor. > Peter, looking at your "displace" patch in this light, I think > tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm > calling it now), should share a common subroutine. Displace replaces the top > node with the new node passed as argument, and then calls the subroutine to > push it down to the right place. Delete_top replaces the top node with the > last value in the heap, and then calls the subroutine. Or perhaps delete_top > should remove the last value from the heap, and then call displace() with > it, to re-insert it. I can see the value in that, but I'd still rather not. The code reuse win isn't all that large, and having a separate tuplesort_heap_root_displace() routine allows us to explain what's going on for merging, to assert what we're able to assert with tape numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex cruft that the existing routines need (if only barely) to support replacement selection. I would be surprised if with a very tight inner loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it increases pipeline stalls). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Sep 8, 2016 at 2:13 PM, Peter Geogheganwrote: > On Thu, Sep 8, 2016 at 8:53 AM, Claudio Freire wrote: >> setup: >> >> create table lotsofitext(i text, j text, w text, z integer, z2 bigint); >> insert into lotsofitext select cast(random() * 10.0 as text) >> || 'blablablawblabla', cast(random() * 10.0 as text) || >> 'blablablawjjjblabla', cast(random() * 10.0 as text) || >> 'blablabl >> awwwabla', random() * 10.0, random() * 1.0 from >> generate_series(1, 1000); >> >> timed: >> >> select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t; >> >> Unpatched Time: 100351.251 ms >> Patched Time: 75180.787 ms >> >> That's like a 25% speedup on random input. As we say over here, rather >> badly translated, not a turkey's boogers (meaning "nice!") > > Cool! What work_mem setting were you using here? The script iterates over a few variations of string patterns (easy comparisons vs hard comparisons), work mem (4MB, 64MB, 256MB, 1GB, 4GB), and table sizes (~350M, ~650M, ~1.5G). That particular case I believe is using work_mem=4MB, easy strings, 1.5GB table. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On 9/8/16 11:16 AM, Tom Lane wrote: > This is a problem, if ICU won't guarantee cross-version compatibility, > because it destroys the argument that moving to ICU would offer us > collation behavior stability. It would offer a significant upgrade over the current situation. First, it offers stability inside the same version. Whereas glibc might change a collation in a minor upgrade, ICU won't do that. And the postgres binary is bound to a major version of ICU by the soname (which changes with every major release). So this would avoid the situation that a simple OS update could break collations. Second, it offers a way to detect that something has changed. With glibc, you don't know anything unless you read the source diffs. With ICU, you can compare the collation version before and after and at least tell the user that they need to refresh indexes or whatever. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default make target in test modules
Tom Lane wrote: > I happened to notice that if you type "make" in > src/test/modules/test_pg_dump, you will get a "make check" action > not "make all". I hope this is just somebody being thoughtless > about Makefile ordering and not an intentional override of the > default make target. If the latter, I beg to differ about it > being a good idea. Strange. Don't all these makefiles depend on the pgxs stuff emitting something sane, which would have "all" as the first one? ... Ooh, I see that test_pg_dump adds "check" before including pgxs, which is what AFAICS causes the problem. I suppose this is a very easy mistake to make -- but also fortunately an easy one to correct. Do you want me to fix the affected modules? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Make initdb's suggested "pg_ctl start" command line more reliabl
Peter Eisentrautwrites: > More generally, I'm concerned that appendShellString() looks pretty > attractive for future use. It's not inconceivable that someone will > want to use it for say calling pg_dump from pg_dumpall or pg_upgrade at > some point, and then maybe we'll accidentally disallow LF/CR in > tablespace names, say. That's fair, but I'm not sure how you propose to avoid it, given that we don't have a method for safely quoting those characters on Windows. I would certainly far rather find a way to do that and then remove the prohibition. But lacking such a solution, we're going to be forced into messy compromises. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default make target in test modules
Alvaro Herrerawrites: > Tom Lane wrote: >> I happened to notice that if you type "make" in >> src/test/modules/test_pg_dump, you will get a "make check" action >> not "make all". > Strange. Don't all these makefiles depend on the pgxs stuff emitting > something sane, which would have "all" as the first one? ... Ooh, I see > that test_pg_dump adds "check" before including pgxs, which is what > AFAICS causes the problem. Right. > I suppose this is a very easy mistake to make -- but also fortunately an > easy one to correct. Do you want me to fix the affected modules? I was going to do it, but if you want to, feel free. (BTW, I notice that test_pg_dump's Makefile misquotes its own path name, probably copy-paste sloppiness.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with tables created in schemas created by extensions
Michael Paquierwrites: > On Tue, Aug 30, 2016 at 5:43 AM, MartÃn Marqués > wrote: >> This is v4 of the patch, which is actually a cleaner version from the >> v2 one Michael sent. > Let's do as you suggest then, and just focus on the schema issue. I > just had an extra look at the patch and it looks fine to me. So the > patch is now switched as ready for committer. Pushed with cosmetic adjustments (mainly, I thought the comment needed to be much more explicit). I'm not sure whether we want to try to fix this in older branches. It would be a significantly larger patch, and maybe more of a behavior change than we want to make in stable branches. So I'm inclined to call it done at this point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] logical replication WIP: FailedAssertion("!(list_length(stmt->tables) > 0)", File: "publicationcmds.c", Line: 350)
Hi, I found these steps to a reliable crash (I know the patch is WIP but I assume you want to hear about these). (Running a single instance suffices) psql (10devel_logical_replication_20160901_1237_6f7c0ea32f80) Type "help" for help. (PID 9809) [testdb] # CREATE PUBLICATION pub_all; CREATE PUBLICATION (PID 9809) [testdb] # ALTER PUBLICATION pub_all FOR ALL TABLES; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Time: 40851.246 ms (PID ) [] ? > Logfile: [...] 2016-09-08 19:24:43.157 CEST 9765 LOG: MultiXact member wraparound protections are now enabled 2016-09-08 19:24:43.160 CEST 9770 LOG: logical replication launcher started 2016-09-08 19:24:43.160 CEST 9764 LOG: database system is ready to accept connections TRAP: FailedAssertion("!(list_length(stmt->tables) > 0)", File: "publicationcmds.c", Line: 350) 2016-09-08 19:25:45.673 CEST 9764 LOG: server process (PID 9809) was terminated by signal 6: Aborted 2016-09-08 19:25:45.673 CEST 9764 DETAIL: Failed process was running: ALTER PUBLICATION pub_all FOR ALL TABLES; 2016-09-08 19:25:45.674 CEST 9764 LOG: terminating any other active server processes 2016-09-08 19:25:45.676 CEST 9764 LOG: all server processes terminated; reinitializing 2016-09-08 19:25:47.463 CEST 9819 LOG: database system was interrupted; last known up at 2016-09-08 19:24:43 CEST 2016-09-08 19:25:47.465 CEST 9820 FATAL: the database system is in recovery mode 2016-09-08 19:25:47.493 CEST 9819 LOG: database system was not properly shut down; automatic recovery in progress 2016-09-08 19:25:47.494 CEST 9819 LOG: redo starts at 0/A6D8C070 2016-09-08 19:25:47.494 CEST 9819 LOG: invalid record length at 0/A6D8D498: wanted 24, got 0 2016-09-08 19:25:47.494 CEST 9819 LOG: redo done at 0/A6D8D460 2016-09-08 19:25:47.494 CEST 9819 LOG: last completed transaction was at log time 2016-09-08 19:25:00.774988+02 2016-09-08 19:25:47.511 CEST 9819 LOG: MultiXact member wraparound protections are now enabled 2016-09-08 19:25:47.514 CEST 9826 LOG: logical replication launcher started 2016-09-08 19:25:47.515 CEST 9764 LOG: database system is ready to accept connections Thanks, Erik Rijkers (BTW, the issue I reported a few days ago was indeed avoided when I created a receiving table subscriber-side, thanks) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Stephen Frostwrites: > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: >> Can't you keep those words as Sconst or something (DefElems?) until the >> execution phase, so that they don't need to be keywords at all? > Seems like we could do that, though I'm not convinced that it really > gains us all that much. These are only unreserved keywords, of course, > so they don't impact users the way reserved keywords (of any kind) can. > While there may be some places where we use a string to represent a set > of defined options, I don't believe that's typical -1 for having to write them as string literals; but I think what Alvaro really means is to arrange for the words to just be identifiers in the grammar, which you strcmp against at execution. See for example reloption_list. (Whether you use DefElem as the internal representation is a minor detail, though it might help for making the parsetree copyObject-friendly.) vacuum_option_elem shows another way to avoid making a word into a keyword, although to me that one is more of an antipattern; it'd be better to leave the strcmp to execution, since there's so much other code that does things that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Tom Lane wrote: > Hey Alvaro, can you comment on this bit in the proposed patch? > > + for (i = FirstOffsetNumber; i <= itemcount; i++) > + { > + ItemId ii = PageGetItemId(phdr, i); > + > + /* Normally here was following assertion > + * Assert(ItemIdHasStorage(ii)); > + * This assertion was introduced in PageIndexTupleDelete > + * But if this function will be used from BRIN index > + * this assertion will fail. Thus, here we do not check > that > + * item has the storage. > + */ > + if (ItemIdGetOffset(ii) <= offset) > + ii->lp_off += size_diff; > + } > + } > > That comment seems utterly wrong to me, because both PageIndexTupleDelete > and PageIndexMultiDelete certainly contain assertions that every item on > the page has storage. Are you expecting that any BRIN index items > wouldn't? If they wouldn't, is adjusting their lp_off as if they did > have storage sensible? It is possible in BRIN to have empty intermediate tuples; for example it is possible for lp 1 and 3 to contain index tuples, while lp 2 does not. This is a tricky case to reproduce, but I think this should do it: consider a table with pages_per_range=1. Page 1 is summarized by index tuple 1, page 2 is summarized by itup 2, page 3 is summarized by index tuple 3. On heap page 2 a new tuple is inserted and the summary data is now much larger, such that the summary tuple no longer fits in the index page, so it is moved to a new index page. Then index tuple 2 in the first index page becomes unused. Revmap still points to lp 3, so it can't be moved to lp 2. The lp_offs can be adjusted, as long as the lp itself is not moved from its original position. Now if this loop is concerned only with live lps and does not move lps, then it should be fine to add the assertion. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Thu, Sep 8, 2016 at 2:09 PM, Tom Lanewrote: > Well, my vote is that it ain't broke and we shouldn't fix it. To take a step back, what prompted this whole discussion is the patch that I wrote that shifts down, replacing calls to tuplesort_heap_siftup() and tuplesort_heap_insert with one new call to a function I've called tuplesort_heap_root_displace() (today, Claudio reports that it makes some of his test queries go 25% faster). This new function shifts down. It's not clear what I'm supposed to say about that, given the current understanding. So, in a sense, it's blocking on this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing deadlock on parallel backup
Tom, Yes, it is what I mean. Is what pg_dump uses to get things synchronized. It seems to me a clear marker that the same task is using more than one connection to accomplish the one job. Em 08/09/2016 6:34 PM, "Tom Lane"escreveu: > Lucas writes: > > The queue jumping logic can not use the distributed transaction id? > > If we had such a thing as a distributed transaction id, maybe the > answer could be yes. We don't. > > I did wonder whether using a shared snapshot might be a workable proxy > for that, but haven't pursued it. > > regards, tom lane >
Re: [HACKERS] mdtruncate leaking fd.c handles?
Andres Freundwrites: > Am I missing something or is md.c:mdtruncate() leaking open files? Yeah, I think you're right. > This only really matters for VACUUM truncate and ON COMMIT TRUNCATE temp > table truncation afaics. Also, you'd need a table > 1GB to leak anything at all, which probably helps explain why it hasn't been noticed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mdtruncate leaking fd.c handles?
On 2016-09-08 18:39:56 -0400, Tom Lane wrote: > Andres Freundwrites: > > Am I missing something or is md.c:mdtruncate() leaking open files? > > Yeah, I think you're right. > > > This only really matters for VACUUM truncate and ON COMMIT TRUNCATE temp > > table truncation afaics. > > Also, you'd need a table > 1GB to leak anything at all, which probably > helps explain why it hasn't been noticed. Heh, this bug is of some older vintage... Appears to originate in 1a5c450f3024ac57cd6079186c71b3baf39e84eb - before that we did a FileUnlink(), which includes a FileClose(). Will fix. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Preventing deadlock on parallel backup
People, I made a small modification in pg_dump to prevent parallel backup failures due to exclusive lock requests made by other tasks. The modification I made take shared locks for each parallel backup worker at the very beginning of the job. That way, any other job that attempts to acquire exclusive locks will wait for the backup to finish. In my case, each server was taking a day to complete the backup, now with parallel backup one is taking 3 hours and the others less than a hour. The code below is not very elegant, but it works for me. My whishlist for the backup is: 1) replace plpgsql by c code reading the backup toc and assembling the lock commands. 2) create an timeout to the locks. 3) broadcast the end of copy to every worker in order to release the locks as early as possible; 4) create a monitor thread that prioritize an copy job based on a exclusive lock acquired; 5) grant the lock for other connection of the same distributed transaction if it is held by any connection of the same distributed transaction. There is some sideefect I can't see on that? 1 to 4 are within my capabilities and I may do it in the future. 4 is to advanced for me and I do not dare to mess with something so fundamental rights now. Anyone else is working on that? On, Parallel.c, void RunWorker(...), add: PQExpBuffer query; PGresult *res; query = createPQExpBuffer(); resetPQExpBuffer(query); appendPQExpBuffer(query, "do language 'plpgsql' $$" " declare " "x record;" " begin" "for x in select * from pg_tables where schemaname not in ('pg_catalog','information_schema') loop" "raise info 'lock table %.%', x.schemaname, x.tablename;" "execute 'LOCK TABLE '||quote_ident(x.schemaname)||'.'||quote_ident(x.tablename)||' IN ACCESS SHARE MODE NOWAIT';" "end loop;" "end" "$$" ); res = PQexec(AH->connection, query->data); if (!res || PQresultStatus(res) != PGRES_COMMAND_OK) exit_horribly(modulename,"Could not lock the tables to begin the work\n\n"); PQclear(res); destroyPQExpBuffer(query);
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Sep 7, 2016 at 3:29 AM, Ashutosh Sharmawrote: > > Thanks to Ashutosh Sharma for doing the testing of the patch and > > helping me in analyzing some of the above issues. > > Hi All, > > I would like to summarize the test-cases that i have executed for > validating WAL logging in hash index feature. > > 1) I have mainly ran the pgbench test with read-write workload at the > scale factor of 1000 and various client counts like 16, 64 and 128 for > time duration of 30 mins, 1 hr and 24 hrs. I have executed this test > on highly configured power2 machine with 128 cores and 512GB of RAM. I > ran the test-case both with and without the replication setup. > > Please note that i have changed the schema of pgbench tables created > during initialisation phase. > > The new schema of pgbench tables looks as shown below on both master > and standby: > > postgres=# \d pgbench_accounts >Table "public.pgbench_accounts" > Column | Type | Modifiers > --+---+--- > aid | integer | not null > bid | integer | > abalance | integer | > filler | character(84) | > Indexes: > "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) > "pgbench_accounts_bid" hash (bid) > > postgres=# \d pgbench_history > Table "public.pgbench_history" > Column |Type | Modifiers > +-+--- > tid| integer | > bid| integer | > aid| integer | > delta | integer | > mtime | timestamp without time zone | > filler | character(22) | > Indexes: > "pgbench_history_bid" hash (bid) > > Hi Ashutosh, This schema will test the maintenance of hash indexes, but it will never use hash indexes for searching, so it limits the amount of test coverage you will get. While searching shouldn't generate novel types of WAL records (that I know of), it will generate locking and timing issues that might uncover bugs (if there are any left to uncover, of course). I would drop the primary key on pgbench_accounts and replace it with a hash index and test it that way (except I don't have a 128 core machine at my disposal, so really I am suggesting that you do this...) The lack of primary key and the non-uniqueness of the hash index should not be an operational problem, because the built in pgbench runs never attempt to violate the constraints anyway. In fact, I'd replace all of the indexes on the rest of the pgbench tables with hash indexes, too, just for additional testing. I plan to do testing using my own testing harness after changing it to insert a lot of dummy tuples (ones with negative values in the pseudo-pk column, which are never queried by the core part of the harness) and deleting them at random intervals. I think that none of pgbench's built in tests are likely to give the bucket splitting and squeezing code very much exercise. Is there a way to gather statistics on how many of each type of WAL record are actually getting sent over the replication link? The only way I can think of is to turn on wal archving as well as replication, then using pg_xlogdump to gather the stats. I've run my original test for a while now and have not seen any problems. But I realized I forgot to compile with enable-casserts, to I will have to redo it to make sure the assertion failures have been fixed. In my original testing I did very rarely get a deadlock (or some kind of hang), and I haven't seen that again so far. It was probably the same source as the one Mark observed, and so the same fix. Cheers, Jeff
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Thu, Sep 8, 2016 at 12:46 PM, Tom Lanewrote: > > /* > * The tuple at state->memtuples[0] has been removed from the heap. > - * Decrement memtupcount, and sift up to maintain the heap invariant. > + * Decrement memtupcount, and shift down to maintain the heap invariant. > */ > static void > -tuplesort_heap_siftup(Tuplesortstate *state, bool checkIndex) > +tuplesort_heap_delete_top(Tuplesortstate *state, bool checkIndex) > > I don't find this clearer at all, because > > (1) As noted in the comment, the heap top has *already* been removed, > we're just trying to re-establish the heap invariant. So maybe > "delete_top" isn't the best name after all. This is why I suggested tuplesort_heap_compact(). When merging, there is a comment associated with a call to the function, "compact the heap". I based my name off of that, thinking that that was really no different to any other caller. > (2) "shift down" seems a bit weird, since we're moving data closer to > the heap top, not further away from it; why isn't that direction "up"? Well, the fact that the routine does that makes it a higher level thing than something like a sift or a shift. It might just as easily be some other tuple (e.g., from the caller), as far as "shifting down" goes. (I wrote a patch where for merging, it *is* from the caller, and the heap stays the same size, which Heikki mentioned on this thread.) > (3) "shift" makes it sound like it ought to be a simple memmove > kind of operation, which it surely is not. Then, how about the Sedgewick terminology, "sink"? I'm really not attached to that aspect. I do think that "shift down" is unambiguous, but I'd just as soon use "sink", or even explicitly spell it out. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Hey Alvaro, can you comment on this bit in the proposed patch? + for (i = FirstOffsetNumber; i <= itemcount; i++) + { + ItemId ii = PageGetItemId(phdr, i); + + /* Normally here was following assertion +* Assert(ItemIdHasStorage(ii)); +* This assertion was introduced in PageIndexTupleDelete +* But if this function will be used from BRIN index +* this assertion will fail. Thus, here we do not check that +* item has the storage. +*/ + if (ItemIdGetOffset(ii) <= offset) + ii->lp_off += size_diff; + } + } That comment seems utterly wrong to me, because both PageIndexTupleDelete and PageIndexMultiDelete certainly contain assertions that every item on the page has storage. Are you expecting that any BRIN index items wouldn't? If they wouldn't, is adjusting their lp_off as if they did have storage sensible? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CVE-2016-1238 fix breaks (at least) pg_rewind tests
On 2016-09-08 18:13:06 -0300, Alvaro Herrera wrote: > I suppose -I$(srcdir) should be fine. (Why the quotes?) Because quoting correctly seems like a good thing to do? Most people won't have whitespace in there, but it doesn't seem impossible? > > check-world appears to mostly run (still doing so, but it's mostly > > through everything relevant). Passed successfully since. > > I can't vouch for the windows stuff, and > > the invocations indeed look vulnerable. I'm not sure if hte fix actually > > matters on windows, given . is the default for pretty much everything > > there. > > Well, maybe it doesn't matter now but as I understand the fix is going > to enter the next stable upstream perl, so it'll fail eventually. It'd > be saner to just fix the thing completely so that we can forget about > it. Yea, it'd need input from somebody on windows. Michael? What happens if you put a line remove . from INC (like upthread) in the msvc stuff? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Thu, Sep 8, 2016 at 2:19 PM, Peter Geogheganwrote: >> Peter, looking at your "displace" patch in this light, I think >> tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm >> calling it now), should share a common subroutine. Displace replaces the top >> node with the new node passed as argument, and then calls the subroutine to >> push it down to the right place. Delete_top replaces the top node with the >> last value in the heap, and then calls the subroutine. Or perhaps delete_top >> should remove the last value from the heap, and then call displace() with >> it, to re-insert it. > > I can see the value in that, but I'd still rather not. The code reuse > win isn't all that large, and having a separate > tuplesort_heap_root_displace() routine allows us to explain what's > going on for merging, to assert what we're able to assert with tape > numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex > cruft that the existing routines need (if only barely) to support > replacement selection. I would be surprised if with a very tight inner > loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it > increases pipeline stalls). BTW, regarding this, since I read in some other thread that it was ok to use static inline functions, I believe the compiler is smart enough to optimize out the constant true/false in checkIndex for inlined calls, so that may be viable (on modern compilers). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM
On 9 Sep. 2016 03:45, "Corey Huinker"wrote: > > > Stylistically, would a separate .pl file for the emitter be preferable to something inline like > >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"' I'd be fine with that and a suitable comment. Just be careful with different platforms' shell escaping rules.
Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM
On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringerwrote: > On 9 Sep. 2016 03:45, "Corey Huinker" wrote: > > > > > > > Stylistically, would a separate .pl file for the emitter be preferable > to something inline like > > > >> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"' > > I'd be fine with that and a suitable comment. Just be careful with > different platforms' shell escaping rules. > Do perl command switches on windows/VMS use /e instead of -e? If so, that'd be a great argument doing just "perl filename".