Re: using expression syntax for partition bounds
On Sat, Jan 26, 2019 at 12:14:33PM +0900, Amit Langote wrote: > The describe lines are there just to show that the stored expessions are > not verbatim same as the input expressions, so it seemed an overkill to add > them for all of the partitions. I see, so per 7c079d7 this is the reason why showing part_3 matters because you want to show the result of the expression after executing the DDL, and this has been just added: +CREATE TABLE part_3 PARTITION OF list_parted FOR VALUES IN ((2+1)); I think that it would be a good thing to show at least the NULL partition because its partition definition has semantics different from the three others so as it replaces part_1. What do you think? -- Michael signature.asc Description: PGP signature
Re: proposal: new polymorphic types - commontype and commontypearray
so 26. 1. 2019 v 1:20 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > Four years ago I proposed implicit casting to common type of arguments > with > > anyelement type. > > > https://www.postgresql.org/message-id/CAFj8pRCZVo_xoW0cfxt%3DmmgjXKBgr3Gm1VMGL_zx9wDRHmm6Cw%40mail.gmail.com > > My proposal was rejected, because it introduce compatibility issues. > > Yup. > > > Now I have a solution that doesn't break anything. With two new > polymorphic > > types: commontype and commontypearray we can write functions like > coalesce, > > greatest, .. > > I think this is a good idea at its core, but I don't like the specifics > too much. > > I agree with the basic idea of introducing a second, independent set of > polymorphic-type variables. Way back when we first discussed polymorphic > types, we thought maybe we should invent anyelement2 and anyarray2, and > perhaps even more pairs, to allow polymorphic functions to deal with two > or more base types. We didn't do that for lack of convincing examples of > the need for it, but I expected some would emerge; I'm rather astonished > that we've gotten by for so many years without adding a second set. > So where I think we should go with this is to solve that need while > we're at it. > I still expect we can have a more polymorphic types like any***(N) There are important questions 1. Has a sense to have more distinct polymorphic types with same be behave? - I see a benefit of this possibility - we can introduce anyelement2 .. AN2 and we can have a function fx(AN,AN, AN2,AN2) .. that means P2 and P4 should to have a same types like P1 and P3. 2. What are a strategy of choosing real type for polymorphic types? - now only equivalence is supported, but I can see more possibilities * common type - I did it * first win - often used in Oracle The common type strategy is more important, because it is typical for some "pseudo" functions like coalesce, least, greatest, .. in Postgres - and extension's developers can design functions more compatible with core functionality. first win can be interesting for me (like Orafce creator and maintainer). It can increase level of similarity implemented functions there, and reduce work when queries are ported to Postgres. But this is important for smaller group of PostgreSQL users. > > However, this proposal doesn't do so, because it omits "commonrange". > I'm prepared to believe that we don't need "commonenum"; that would > presumably have the semantics of "resolve the common type and then > it must be an enum". And that seems pretty useless, because there > are no type resolution rules that would let us choose one enum out of > a set. (I suppose somebody might create implicit casts between some > enum types, but it doesn't seem very likely.) I also suspect that > we could get away without "commonnonarray". Anynonarray is really > just a hack that we invented to avoid ambiguity around the || > operator, and an equivalent need would likely not come up for this > second set of types. (I could be wrong though; I'm not sure right > now whether array_agg's use of anynonarray rather than anyelement > is essential or just randomness.) But neither of those arguments > apply to commonrange; in fact it's highly likely that somebody would > want to have "myfunc(commontype, commontype) returns commonrange" > as a customized range constructor that can deal with slightly > different input types. > I implemented just minimal set of new polymorphic types, just for demonstration of my idea. Better coverage of other variants (where it has a sense) is not a problem. Now, mostly I am searching a design where can be a some agreement. > > My second problem with this proposal is that it simply ignores > the naming precedent of the existing polymorphic types. We have > a convention that polymorphic types are named "any-something", > and I do not think we should just toss that overboard. Moreover, > if we do end up needing "commonnonarray" or "commonenum", those > names are ugly, typo-prone, and unreasonably long. > > We could do worse than to call these types anyelement2, anyarray2, > anyrange2 and just document that their resolution rule depends > on finding a common type rather than identical base types. > I suppose that's not too pretty --- it reminds one of Oracle finally > getting varchar semantics right with varchar2 :-(. Another idea > is anyelementc, anyarrayc, anyrangec ("c" for "common") but that's > not pretty either. Anyway I think the names need to be any-something. > I am open to any ideas. I don't like anyelement2, anyarray2 because a) it is not verbose - and really different behave should not be signed by number b) I can imagine very well more anyelementX types. I don't think so length is too important factor (but I fully agree - shorter is better here). The polymorphic types are not too common. I though about your proposed anyelementc, but the "c" is not much visible. Can we use snake notation?
Re: PostgreSQL vs SQL/XML Standards
On 01/25/19 22:53, Pavel Stehule wrote: > Documentation review will be harder - I am not a native speaker and I have > not a necessary knowledges of XQuery (probably only you have this > knowledge). The doc patch is enough that I think it would be ideal to somehow attract a native speaker who has interest in documentation to review it. Even if that is someone without much XQuery-specific knowledge—I'll happily answer whatever questions they ask about that. :) -Chap
Re: PostgreSQL vs SQL/XML Standards
On 01/25/19 19:37, Chapman Flack wrote: > There is still nothing in this patch set to address [1], though that > also seems worth doing, perhaps in another patch, and probably not > difficult, perhaps needing only a regex. Heck, it could be even simpler than that. If some XML input has a DTD, the attempt to parse it as 'content' with libxml is sure to fail early in the parse (because that's where the DTD is found). If libxml reports enough error detail to show it failed because of a DTD—and it appears to: DETAIL: line 1: StartTag: invalid element name —then simply recognize that error and reparse as 'document' on the spot. The early-failing first parse won't have cost much, and there is probably next to nothing to gain by trying to be any more clever. The one complication might that there seem to be versions of libxml that report error detail differently (hence the variant regression test expected files), so the code might have to recognize a couple different forms. -Chap > [1] https://www.postgresql.org/message-id/5BD1C44B.6040300%40anastigmatix.net
Re: PostgreSQL vs SQL/XML Standards
Hi so 26. 1. 2019 v 4:25 odesílatel Chapman Flack napsal: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:not tested > maybe more simple will be work with original commitfest subscriptions. I can do review of xmltable-xmlexists-passing-mechanisms-2.patch. Documentation review will be harder - I am not a native speaker and I have not a necessary knowledges of XQuery (probably only you have this knowledge). > > As the reporter of the issues raised in this email thread, I've reviewed > the first patch > and contributed the second and third. > > WHAT THE PATCHES DO: > > xmltable-xpath-result-processing-bugfix-5.patch contains code changes > correcting > a subset of the issues that were raised in this email thread. > > xmltable-xmlexists-passing-mechanisms-2.patch adjusts the grammar to allow > the XML > parameter passing mechanism BY VALUE as well as BY REF. Both are ignored, > but > formerly BY VALUE was a syntax error, which was unintuitive considering > that BY VALUE > is the passing mechanism PostgreSQL implements (XML node identities are > not preserved). > > xml-functions-type-docfix-1.patch conforms the documentation to reflect > the changes in > this patch set and the limitations identified in this thread. > > WHAT I HAVE REVIEWED: > > I have applied all three patches over 18c0da8 and confirmed that > installcheck-world passes > and that the code changes resolve the issues they set out to resolve. > > I've made no entry for "spec compliant" because the question is moot; the > spec is written > in terms of the XQuery language, types, and concepts, and these facilities > in PG are > implemented on XPath 1.0, which doesn't have those. But the changes in > this patch set > do make the PG behaviors more, well, closely analogous to the way the spec > compliant > functions would behave. > > WHAT I HAVE NOT REVIEWED: > > The passing-mechanisms and docfix patches are my own work, so there should > be another > reviewer who is not me. I've looked closely at the technical, SQL/XML > behavior aspects already, > but a reviewer with an eye for documentation would be welcome. > > I'll venture my opinion that this is ready-for-committer to the extent of > my own review, but will > leave the status at needs-review for a not-me reviewer to update.
Re: [PATCH] Allow UNLISTEN during recovery
> > > Thanks for insisting - I ended up setting up the environment and running > > the tests, and discovering that some test-related changes were missing. > > Here's a 3rd version of the patch. Hope this is now in good shape, let me > > know if you think anything else needs to be done. > > Lotta work for a one-line code change, eh? Pushed now. > Yes, especially when you're new to building and testing PostgreSQL :) Thanks for accepting!
Re: PostgreSQL vs SQL/XML Standards
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested As the reporter of the issues raised in this email thread, I've reviewed the first patch and contributed the second and third. WHAT THE PATCHES DO: xmltable-xpath-result-processing-bugfix-5.patch contains code changes correcting a subset of the issues that were raised in this email thread. xmltable-xmlexists-passing-mechanisms-2.patch adjusts the grammar to allow the XML parameter passing mechanism BY VALUE as well as BY REF. Both are ignored, but formerly BY VALUE was a syntax error, which was unintuitive considering that BY VALUE is the passing mechanism PostgreSQL implements (XML node identities are not preserved). xml-functions-type-docfix-1.patch conforms the documentation to reflect the changes in this patch set and the limitations identified in this thread. WHAT I HAVE REVIEWED: I have applied all three patches over 18c0da8 and confirmed that installcheck-world passes and that the code changes resolve the issues they set out to resolve. I've made no entry for "spec compliant" because the question is moot; the spec is written in terms of the XQuery language, types, and concepts, and these facilities in PG are implemented on XPath 1.0, which doesn't have those. But the changes in this patch set do make the PG behaviors more, well, closely analogous to the way the spec compliant functions would behave. WHAT I HAVE NOT REVIEWED: The passing-mechanisms and docfix patches are my own work, so there should be another reviewer who is not me. I've looked closely at the technical, SQL/XML behavior aspects already, but a reviewer with an eye for documentation would be welcome. I'll venture my opinion that this is ready-for-committer to the extent of my own review, but will leave the status at needs-review for a not-me reviewer to update.
Re: pg_basebackup, walreceiver and wal_sender_timeout
On Fri, Jan 25, 2019 at 03:26:38PM +0100, Nick B wrote: > On server we see this error firing: "terminating walsender process due to > replication timeout" > The problem occurs during a network or file system acting very slow. One > example of such case looks like this (strace output for fsync calls): > > 0.033383 fsync(8) = 0 <20.265275> > 20.265399 fsync(8) = 0 <0.11> > 0.022892 fsync(7) = 0 <48.350566> > 48.350654 fsync(7) = 0 <0.05> > 0.000674 fsync(8) = 0 <0.851536> > 0.851619 fsync(8) = 0 <0.07> > 0.67 fsync(7) = 0 <0.06> > 0.45 fsync(7) = 0 <0.05> > 0.031733 fsync(8) = 0 <0.826957> > 0.827869 fsync(8) = 0 <0.16> > 0.005344 fsync(7) = 0 <1.437103> > 1.446450 fsync(6) = 0 <0.063148> > 0.063246 fsync(6) = 0 <0.06> > 0.000381 +++ exited with 1 +++ These are a bit unregular. Which files are taking that long to complete while others are way faster? It may be something that we could improve on the base backup side as there is no actual point in syncing segments while the backup is running and we could delay that at the end of the backup (if I recall that stuff correctly). > This begs a question, why is the GUC handled the way it is? What would be > the correct way to solve this? Shall we change the behaviour or do a better > job explaining what are implications of wal_sender_timeout in the > docs? The following commit and thread are the ones you look for here: https://www.postgresql.org/message-id/506972b9.6060...@vmware.com commit: 6f60fdd7015b032bf49273c99f80913d57eac284 committer: Heikki Linnakangas date: Thu, 11 Oct 2012 17:48:08 +0300 Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. The docs could be improved to describe that better.. -- Michael signature.asc Description: PGP signature
Re: using expression syntax for partition bounds
Hi, On Sat, Jan 26, 2019 at 12:01 Michael Paquier wrote: > On Sat, Jan 26, 2019 at 03:14:51AM +0900, Amit Langote wrote: > > How about replacing \d+ list_parted with couple of \d on individual > > partitions, like in the attached? > > That would make it. Why just part_1 and part_3 though? It looks more > complete to add part_null and part_2 as well. The describe lines are there just to show that the stored expessions are not verbatim same as the input expressions, so it seemed an overkill to add them for all of the partitions. Thanks, Amit PS: away from a computer, typing on the smartphone >
Re: crosstab/repivot...any interest?
Hello, I'm not a C coder and can't helpbut I *love* cross-tab/pivot tables. They're the best, and just fantastic for preparing data to feed into various analysis tools. The tablefunc module is helpful, but a bit awkward to use (as I remember it.) >From a user's point of view, I high-performance cross-tab generator would be just fantastic. As I understand it, this is what's involved in a pivot: 1. Identify rows that should be *grouped* (consolidated.) 2. Distinguish the value that identifies each *derived* column. 3. Distinguish the value that *identifies* each row-column value. 4. *Collapse* the rows, *build* the columns, and *populate* the 'cells' with data. In an ideal world, you would be able to perform different grouping operations. Such as count, sum, avg, etc. If there's a way to do this in a system-wide and standards-pointing way, so much the better. *Apologies* if I'm violating list etiquette by jumping in here. I've been lurking on several Postgres lists for a bit and picking up interesting details every day. If I've been Unintentionally and Cluelessly Off, I'm find with being told. On Sat, Jan 26, 2019 at 10:49 AM David Fetter wrote: > On Fri, Jan 25, 2019 at 04:31:00PM -0600, Merlin Moncure wrote: > > On Fri, Jan 25, 2019 at 3:16 PM David Fetter wrote: > > > > > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > > > > Hackers, > > > > > > > > We have a strong need to make a variant to the crosstab interface so > > > > that data that is pivoted one way would be sent through a crosstab > > > > like function so that it would be pivoted another way. For example, > > > > if you had > > > > > > > > row 0: a1, a2, a3, k1, c1, c2, ... > > > > row 1: a1, a2, a3, k2, c1, c2, ... > > > > row 2: a1, a2, a3, k3, c1, c2, ... > > > > ... > > > > > > > > where 'a' columns are uninteresting attribute columns, 'k' is the > > > > dimension we want to pivot on, and c1->cN would be stacked > vertically, > > > > so that we'd end up with, > > > > row 0: a1, a2, a3, c1, k1, k2, ... > > > > row 1: a1, a2, a3, c2, k1, k2, ... > > > > row 2: a1, a2, a3, c3, k1, k2, ... > > > > > > > > There are various SQL level approaches to this but they tend to be > > > > imperformant with large datasets so that I think a crosstab-like C > > > > implementation ought to be able to do better (or at least I hope so) > > > > since you have to cross product rows and columns in such a way that > > > > you can get a clean join. Cribbing from tablefunc.c I don't think > > > > this is a terrible challenge to do in hash table style. > > > > > > > > Questions on the table: > > > > *) Has anyone done anything like this or know of any current > implementations? > > > > *) Would there be any interest in expanding tablefunc along these > lines? > > > > > > There's something in SQL:2016 that I read as crosstabs, or at least as > > > enabling crosstabs. > > > https://www.iso.org/standard/69776.html > > > > > > If we're going to put work into crosstabs, it seems to me that the > > > "we" needs to be the project as a whole, and the work should be, to > > > the extent reasonable, toward standard compliance. > > > > Interesting. Do you see that the spec (it makes my brain hurt) can > > handle that kind of repivoting? > > I believe the constructs can nest and/or refer to each other, so yes. > > Best, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > >
Re: Simplify set of flags used by MyXactFlags
On Fri, Jan 25, 2019 at 12:30:39PM -0300, Alvaro Herrera wrote: > I am, except that the change of "table" to "object" in xact.c line 2262 > makes the new paragraph read a bit weird -- it's now saying "if we've > added a temp object ... Other objects have the same problem". Doesn't > make sense. If you keep that word as "table", it works fine. Indeed. Committed with this suggestion. Thanks for the review. -- Michael signature.asc Description: PGP signature
Re: using expression syntax for partition bounds
On Sat, Jan 26, 2019 at 03:14:51AM +0900, Amit Langote wrote: > How about replacing \d+ list_parted with couple of \d on individual > partitions, like in the attached? That would make it. Why just part_1 and part_3 though? It looks more complete to add part_null and part_2 as well. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Allow UNLISTEN during recovery
Shay Rojansky writes: > Thanks for insisting - I ended up setting up the environment and running > the tests, and discovering that some test-related changes were missing. > Here's a 3rd version of the patch. Hope this is now in good shape, let me > know if you think anything else needs to be done. Lotta work for a one-line code change, eh? Pushed now. regards, tom lane
Re: jsonpath
On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov wrote: > Finally, I'm going to commit this if no objections. BTW, I decided to postpone commit for few days. Nikita and me are still working on better error messages. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
David Rowley writes: > As far as I can see the patch is ready to go, but I'll defer to Mark, > who's also listed on the reviewer list for this patch. Mark, are you planning to do further review on this patch? I'd like to move it along, since (IMO anyway) we need it in before progress can be made on https://commitfest.postgresql.org/21/1664/ regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra > wrote: >> AFAICS the patch essentially does two things: (a) identifies Append >> paths with a single member and (b) ensures the Vars are properly mapped >> when the Append node is skipped when creating the plan. >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But >> perhaps we could do at least (b) to setrefs.c? > The problem I found with doing var translations in setrefs.c was that > the plan tree is traversed there breadth-first and in createplan.c we > build the plan depth-first. The problem I didn't manage to overcome > with that is that when we replace a "ProxyPath" (actually an Append > path marked as is_proxy=true) it only alter targetlists, etc of nodes > above that in the plan tree. The nodes below it and their targetlists > don't care, as these don't fetch Vars from nodes above them. If we do > the Var translation in setrefs then we've not yet looked at the nodes > that don't care and we've already done the nodes that do care. So the > tree traversal is backwards. I don't quite buy this argument, because you haven't given a reason why it doesn't apply with equal force to setrefs.c's elimination of no-op SubqueryScan nodes. Yet that does work. Stepping back for a minute: even if we get this to work, how often is it going to do anything useful? It seems like most of the other development that's going on is trying to postpone pruning till later, so I wonder how often we'll really know at the beginning of planning that only one child will survive. regards, tom lane
Re: proposal: new polymorphic types - commontype and commontypearray
Pavel Stehule writes: > Four years ago I proposed implicit casting to common type of arguments with > anyelement type. > https://www.postgresql.org/message-id/CAFj8pRCZVo_xoW0cfxt%3DmmgjXKBgr3Gm1VMGL_zx9wDRHmm6Cw%40mail.gmail.com > My proposal was rejected, because it introduce compatibility issues. Yup. > Now I have a solution that doesn't break anything. With two new polymorphic > types: commontype and commontypearray we can write functions like coalesce, > greatest, .. I think this is a good idea at its core, but I don't like the specifics too much. I agree with the basic idea of introducing a second, independent set of polymorphic-type variables. Way back when we first discussed polymorphic types, we thought maybe we should invent anyelement2 and anyarray2, and perhaps even more pairs, to allow polymorphic functions to deal with two or more base types. We didn't do that for lack of convincing examples of the need for it, but I expected some would emerge; I'm rather astonished that we've gotten by for so many years without adding a second set. So where I think we should go with this is to solve that need while we're at it. However, this proposal doesn't do so, because it omits "commonrange". I'm prepared to believe that we don't need "commonenum"; that would presumably have the semantics of "resolve the common type and then it must be an enum". And that seems pretty useless, because there are no type resolution rules that would let us choose one enum out of a set. (I suppose somebody might create implicit casts between some enum types, but it doesn't seem very likely.) I also suspect that we could get away without "commonnonarray". Anynonarray is really just a hack that we invented to avoid ambiguity around the || operator, and an equivalent need would likely not come up for this second set of types. (I could be wrong though; I'm not sure right now whether array_agg's use of anynonarray rather than anyelement is essential or just randomness.) But neither of those arguments apply to commonrange; in fact it's highly likely that somebody would want to have "myfunc(commontype, commontype) returns commonrange" as a customized range constructor that can deal with slightly different input types. My second problem with this proposal is that it simply ignores the naming precedent of the existing polymorphic types. We have a convention that polymorphic types are named "any-something", and I do not think we should just toss that overboard. Moreover, if we do end up needing "commonnonarray" or "commonenum", those names are ugly, typo-prone, and unreasonably long. We could do worse than to call these types anyelement2, anyarray2, anyrange2 and just document that their resolution rule depends on finding a common type rather than identical base types. I suppose that's not too pretty --- it reminds one of Oracle finally getting varchar semantics right with varchar2 :-(. Another idea is anyelementc, anyarrayc, anyrangec ("c" for "common") but that's not pretty either. Anyway I think the names need to be any-something. I haven't particularly studied the patch code, but I will note that this sort of change seems pretty dumb: @@ -953,7 +953,7 @@ make_scalar_array_op(ParseState *pstate, List *opname, * enforce_generic_type_consistency may or may not have replaced a * polymorphic type with a real one. */ - if (IsPolymorphicType(declared_arg_types[1])) + if (IsPolymorphicTypeAny(declared_arg_types[1])) { /* assume the actual array type is OK */ res_atypeId = atypeId; Why would we want to reject the new poly types here? Or just about anyplace else that tests IsPolymorphicType? The argument-type resolution functions themselves need to distinguish the two groups of types, at least for some purposes, but it's very hard to believe anyplace else should do so. regards, tom lane
Re: How does the planner determine plan_rows ?
On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote: > > > On Jan 10, 2019, at 8:01 PM, Tom Lane wrote: > > > > ... estimate_rel_size() in plancat.c is where to look to find out > > about the planner's default estimates when it's lacking hard data. > > Thank you! Now I see how the planner uses the rows to estimate the cost and > generates the best_plan. > > To me, tracing the function calls is not a simple task. I'm using cscope, and > I > use printf when I'm not entirely sure. I was considering to use gbd, but I'm > having issues referencing the source code in gdb. > > I'm very interested to learn how the professionals explore the codebase! Uh, the developer FAQ has some info on this: https://wiki.postgresql.org/wiki/Developer_FAQ -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Strange wording in advanced tutorial
On Thu, Jan 10, 2019 at 01:33:43PM +0100, Daniel Gustafsson wrote: > The file header in the advanced tutorial has what seems like incorrect (or at > least odd) wording: "Tutorial on advanced more PostgreSQL features”. Attached > patch changes to “more advanced” which I think is what was the intention. > > I can willingly admit that I had never even noticed the tutorial directory > until I yesterday stumbled across it. The commit introducing the above > wording > is by now old enough to drive. > Agreed, thanks more. ;-) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Bison state table
On Sat, Jan 26, 2019 at 12:49:50AM +0100, David Fetter wrote: > On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote: > > With our scanner keywords list now more cache-aware, and with us > > planning to use Bison for years to come, has anyone ever looked at > > reordering the bison state machine array to be more cache aware, e.g., > > having common states next to each other rather than scattered around the > > array? > > Do we have a pretty good idea of what commonly grouped states are, or > at least a way to get some not-awful estimates of what they are? Uh, I am afraid we would need to profile the grammer with some sample queries and then base the reordering on that, kind of how compilers use sampling to do branch prediction. Yeah, crazy idea, I know. I figured some smart person might have written a tool to do that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Bison state table
On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote: > With our scanner keywords list now more cache-aware, and with us > planning to use Bison for years to come, has anyone ever looked at > reordering the bison state machine array to be more cache aware, e.g., > having common states next to each other rather than scattered around the > array? Do we have a pretty good idea of what commonly grouped states are, or at least a way to get some not-awful estimates of what they are? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: crosstab/repivot...any interest?
On Fri, Jan 25, 2019 at 04:31:00PM -0600, Merlin Moncure wrote: > On Fri, Jan 25, 2019 at 3:16 PM David Fetter wrote: > > > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > > > Hackers, > > > > > > We have a strong need to make a variant to the crosstab interface so > > > that data that is pivoted one way would be sent through a crosstab > > > like function so that it would be pivoted another way. For example, > > > if you had > > > > > > row 0: a1, a2, a3, k1, c1, c2, ... > > > row 1: a1, a2, a3, k2, c1, c2, ... > > > row 2: a1, a2, a3, k3, c1, c2, ... > > > ... > > > > > > where 'a' columns are uninteresting attribute columns, 'k' is the > > > dimension we want to pivot on, and c1->cN would be stacked vertically, > > > so that we'd end up with, > > > row 0: a1, a2, a3, c1, k1, k2, ... > > > row 1: a1, a2, a3, c2, k1, k2, ... > > > row 2: a1, a2, a3, c3, k1, k2, ... > > > > > > There are various SQL level approaches to this but they tend to be > > > imperformant with large datasets so that I think a crosstab-like C > > > implementation ought to be able to do better (or at least I hope so) > > > since you have to cross product rows and columns in such a way that > > > you can get a clean join. Cribbing from tablefunc.c I don't think > > > this is a terrible challenge to do in hash table style. > > > > > > Questions on the table: > > > *) Has anyone done anything like this or know of any current > > > implementations? > > > *) Would there be any interest in expanding tablefunc along these lines? > > > > There's something in SQL:2016 that I read as crosstabs, or at least as > > enabling crosstabs. > > https://www.iso.org/standard/69776.html > > > > If we're going to put work into crosstabs, it seems to me that the > > "we" needs to be the project as a whole, and the work should be, to > > the extent reasonable, toward standard compliance. > > Interesting. Do you see that the spec (it makes my brain hurt) can > handle that kind of repivoting? I believe the constructs can nest and/or refer to each other, so yes. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 9:50 PM Amit Kapila wrote: > > On Fri, Jan 25, 2019 at 1:03 AM John Naylor > wrote: > > > > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila > > wrote: > > > I think what doc means to say is > > > that it copies any unlinked files present in primary's new cluster > > > (which in your case will be data2). > > > > In that case, I'm still confused why that doc says, "Unfortunately, > > rsync needlessly copies files associated with temporary and unlogged > > tables because these files don't normally exist on standby servers." > > I fail to see why the primary's new cluster would have these if they > > weren't linked. > > > > Why unlogged files won't be in primary's new cluster? After the > upgrade, they should be present in a new cluster if they were present > in the old cluster. I assume they would be linked, however (I haven't checked this). I did think rewritten VM files would fall under this, but I was confused about unlogged files. > > And in the case we're discussing here, the skipped > > FSMs won't be on data2, so won't end up in standby/data2. > > > > Right. I think we are safe with respect to rsync because I have seen > that we do rewrite the vm files in link mode and rsync will copy them > from primary's new cluster. Okay. > I think you can try to address my other comments on your pg_upgrade > patch. Once we agree on the code, we need to test below scenarios: > (a) upgrade from all supported versions to the latest version > (b) upgrade standby with and without using rsync. Sounds good. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila wrote: > > 1. > + if ((maps[mapnum].relkind != RELKIND_RELATION && > + maps[mapnum].relkind != RELKIND_TOASTVALUE) || > + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ || > + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100) > + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit); > > I think this check will needlessly be performed for future versions as > well, say when wants to upgrade from PG12 to PG13. That might not > create any problem, but let's try to be more precise. Can you try to > rewrite this check? You might want to encapsulate it inside a > function. I have thought of doing something similar to what we do for > vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I > guess for this patch it is not important to check catalog version as > even if someone tries to upgrade to the same version. Agreed, done for v19 (I've only attached the pg_upgrade patch). > 2. > transfer_relfile() > { > .. > - /* Is it an extent, fsm, or vm file? */ > - if (type_suffix[0] != '\0' || segno != 0) > + /* Did file open fail? */ > + if (stat(old_file, ) != 0) > .. > } > > So from now onwards, we will call stat for even 0th segment which > means there is one additional system call for each relation, not sure > if that matters, but I think there is no harm in once testing with a > large number of relations say 10K to 50K relations which have FSM. Performance testing is probably a good idea anyway, but I went ahead and implemented your next idea: > The other alternative is we can fetch pg_class.relpages and rely on > that to take this decision, but again if that is not updated, we might > take the wrong decision. We can think of it this way: Which is worse, 1. Transferring a FSM we don't need, or 2. Skipping a FSM we need I'd say #2 is worse. So, in v19 we check pg_class.relpages and if it's a heap and less than or equal the threshold we call stat on the 0th segment to verify. In the common case, the cost of the stat call is offset by not linking the FSM. Despite needing another pg_class field, I think this code is actually easier to read than my earlier versions. > 3. > -static void > +static Size > transfer_relfile(FileNameMap *map, const char *type_suffix, bool > vm_must_add_frozenbit) > > If we decide to go with the approach proposed by you, we should add > some comments atop this function for return value change? Done, as well as other comment edits. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 277d969c31349b22e2c7726935d681e72aacaece Mon Sep 17 00:00:00 2001 From: John Naylor Date: Fri, 25 Jan 2019 17:18:28 -0500 Subject: [PATCH v19 3/3] During pg_upgrade, conditionally skip transfer of FSMs. If a heap on the old cluster has 4 pages or fewer, don't copy or link the FSM. This will reduce space usage for installations with large numbers of small tables. --- src/bin/pg_upgrade/info.c| 18 ++- src/bin/pg_upgrade/pg_upgrade.h | 6 +++ src/bin/pg_upgrade/relfilenode.c | 84 +++- 3 files changed, 93 insertions(+), 15 deletions(-) diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index 2f925f086c..55d4911d10 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data, map->old_db_oid = old_db->db_oid; map->new_db_oid = new_db->db_oid; + map->relpages = old_rel->relpages; + map->relkind = old_rel->relkind; /* * old_relfilenode might differ from pg_class.oid (and hence @@ -415,9 +417,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) int ntups; int relnum; int num_rels = 0; + int relpages; char *nspname = NULL; char *relname = NULL; char *tablespace = NULL; + char *relkind = NULL; int i_spclocation, i_nspname, i_relname, @@ -425,7 +429,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_indtable, i_toastheap, i_relfilenode, -i_reltablespace; +i_reltablespace, +i_relpages, +i_relkind; char query[QUERY_ALLOC]; char *last_namespace = NULL, *last_tablespace = NULL; @@ -494,7 +500,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) */ snprintf(query + strlen(query), sizeof(query) - strlen(query), "SELECT all_rels.*, n.nspname, c.relname, " - " c.relfilenode, c.reltablespace, %s " + " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s " "FROM (SELECT * FROM regular_heap " " UNION ALL " " SELECT * FROM toast_heap " @@ -526,6 +532,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_relfilenode = PQfnumber(res, "relfilenode"); i_reltablespace = PQfnumber(res, "reltablespace"); i_spclocation = PQfnumber(res, "spclocation"); + i_relpages = PQfnumber(res, "relpages"); + i_relkind = PQfnumber(res, "relkind");
Re: GUC parameters accepts values in both octal and hexadecimal formats
On Tue, Jan 8, 2019 at 05:08:41PM +1100, Haribabu Kommi wrote: > Hi Hackers, > > The Server GUC parameters accepts values in both Octal and hexadecimal formats > also. > > postgres=# set max_parallel_workers_per_gather='0x10'; > postgres=# show max_parallel_workers_per_gather; > max_parallel_workers_per_gather > - > 16 > > postgres=# set max_parallel_workers_per_gather='010'; > postgres=# show max_parallel_workers_per_gather; > max_parallel_workers_per_gather > - > 8 > > I can check that this behavior exists for quite some time, but I am not able > to > find any documentation related to it? Can some one point me to relevant > section > where it is available? If not exists, is it fine to add it? Well, we call strtol() in guc.c, and the strtol() manual page says: The string may begin with an arbitrary amount of white space (as determined by isspace(3)) followed by a single optional '+' or '-' sign. If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal). so it looks like the behavior is just a side-effect of our strtol call. I am not sure it is worth documenting though. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Alternative to \copy in psql modelled after \g
Hello Tom, Fixing the problem envolves deciding what is the right behavior, and update the documentation and the implementation accordingly. Currently the documentation suggests that :ERROR is about SQL error (subject to interpretation), and the implementation is more or less consistent with that view, but not always, as pointed out by Daniel. FWIW, I think we should adopt the rule that :ERROR reflects any error condition, whether server-side or client-side. I think that everybody agrees with that. It's unlikely that scripts would really not care about client-side errors. Moreover, I do not think we can reliably tell the difference; there are always going to be corner cases that are hard to classify. Example: if we lose the server connection, is that a server-side error or client-side? Or maybe neither, maybe the network went down. Because of this concern, I find the idea of inventing separate SQL_ERROR and CLIENT_ERROR variables to be a pretty terrible one. Then the committer is right:-) I think we'd be creating a lot of make-work and hard choices for ourselves, to do something that most script writers won't give a fig about. If you've got subtle error-handling logic in mind, you shouldn't be trying to implement your app in a psql script in the first place, IMO anyway. Possibly. I was also thinking of debug. ISTM that SQL_ERROR is pretty trivial to implement because we already set SQLSTATE, and SQL_ERROR is probably something like SQLSTATE != "" or close to that. It's unclear to me whether to push ahead with Daniel's existing patch or not. It doesn't look to me like it's making things any worse from the error-consistency standpoint than they were already, so I'd be inclined to consider error semantics cleanup as something to be done separately/later. Fine. BTW, it looks to me like the last patch is still not right as far as when to close copystream --- won't it incorrectly close a stream obtained from pset.copyStream? Argh, I should have checked this point. -- Fabien.
Bison state table
With our scanner keywords list now more cache-aware, and with us planning to use Bison for years to come, has anyone ever looked at reordering the bison state machine array to be more cache aware, e.g., having common states next to each other rather than scattered around the array? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Adding a concept of TEMPORARY TABLESPACE for the use in temp_tablespaces
On Sun, Jan 6, 2019 at 11:01:52AM -0800, Mitar wrote: > Hi! > > I have read around the Internet a lot about the idea of using /dev/shm > for a tablespace to put tables in and issues with that. But I still > have not managed to get a good grasp why would that be a bad idea for > using it for temporary objects. I understand that for regular tables > this might prevent database startup and recovery because tables and > all files associated with tables would be missing. While operations > for those tables could reside in the oplog. (Not sure if this means > that unlogged tables can be stored on such tablesspace.) > > I have experimented a bit and performance really improves if /dev/shm > is used. I have experimented with creating temporary tables inside a > regular (SSD backed) tablespace /dev/shm and I have seen at least 2x > improvement in time it takes for a set of modification+select queries > to complete. > > I have also tested what happens if I kill all processes with KILL and > restart it. There is noise in logs about missing files, but it does > start up. Dropping and recreating the tablespace works. > > So I wonder, should we add a TEMPORARY flag to a TABLESPACE which > would mark a tablespace such that if at startup its location is empty, > it is automatically recreated, without warnings/errors? (Maybe some > other term could be used for this.) > > Ideally, such tablespace could be set as temp_tablespaces and things > should work out: PostgreSQL should recreate the tablespace before > trying to use temp_tablespaces for the first time. > > We could even make it so that only temporary objects are allowed to be > created in a TEMPORARY TABLESPACE, to make sure user does not make a > mistake. I wrote a blog entry about this: https://momjian.us/main/blogs/pgblog/2017.html#June_2_2017 This is certainly an area we can improve, but it would require changes in several parts of the system to handle cases where the tablespace disappears. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: crosstab/repivot...any interest?
On Fri, Jan 25, 2019 at 3:16 PM David Fetter wrote: > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > > Hackers, > > > > We have a strong need to make a variant to the crosstab interface so > > that data that is pivoted one way would be sent through a crosstab > > like function so that it would be pivoted another way. For example, > > if you had > > > > row 0: a1, a2, a3, k1, c1, c2, ... > > row 1: a1, a2, a3, k2, c1, c2, ... > > row 2: a1, a2, a3, k3, c1, c2, ... > > ... > > > > where 'a' columns are uninteresting attribute columns, 'k' is the > > dimension we want to pivot on, and c1->cN would be stacked vertically, > > so that we'd end up with, > > row 0: a1, a2, a3, c1, k1, k2, ... > > row 1: a1, a2, a3, c2, k1, k2, ... > > row 2: a1, a2, a3, c3, k1, k2, ... > > > > There are various SQL level approaches to this but they tend to be > > imperformant with large datasets so that I think a crosstab-like C > > implementation ought to be able to do better (or at least I hope so) > > since you have to cross product rows and columns in such a way that > > you can get a clean join. Cribbing from tablefunc.c I don't think > > this is a terrible challenge to do in hash table style. > > > > Questions on the table: > > *) Has anyone done anything like this or know of any current > > implementations? > > *) Would there be any interest in expanding tablefunc along these lines? > > There's something in SQL:2016 that I read as crosstabs, or at least as > enabling crosstabs. > https://www.iso.org/standard/69776.html > > If we're going to put work into crosstabs, it seems to me that the > "we" needs to be the project as a whole, and the work should be, to > the extent reasonable, toward standard compliance. Interesting. Do you see that the spec (it makes my brain hurt) can handle that kind of repivoting? merlin
Re: Early WIP/PoC for inlining CTEs
Andreas Karlsson writes: > [ inlining-ctes-v8.patch ] I went ahead and pushed the stuff about QTW_EXAMINE_RTES_BEFORE/_AFTER, because that seems like an independent change with other possible uses. Attached is an updated version of the rest of the patch, with mostly cosmetic changes. I've not touched the documentation, but I think this is otherwise committable if we are satisfied with the semantics. However ... after thinking about it more, I'm not really satisfied with that. In particular I don't like the fact that by default this will inline regardless of the number of references to the CTE. I doubt that inlining when there are multiple references is so likely to be a win as to justify it being the default, especially given that it flies in the face of what our documentation has said for as long as we've had CTEs. Therefore, I'm reversing my previous opinion that we should not have an explicit NOT MATERIALIZED option. I think we should add that, and the behavior ought to be: * No option given: inline if there's exactly one reference. * With MATERIALIZED: never inline. * With NOT MATERIALIZED: inline regardless of the number of references. (Obviously, we should not inline if there's RECURSIVE or the CTE potentially has side-effects, regardless of the user option; I don't think those cases are up for debate.) I haven't done anything about that here, but the changes would be pretty minor. regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index f177eba..6d456f6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -2925,6 +2925,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node) /* we store the string name because RTE_CTE RTEs need it */ APP_JUMB_STRING(cte->ctename); +APP_JUMB(cte->ctematerialized); JumbleQuery(jstate, castNode(Query, cte->ctequery)); } break; diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index bb92d9d..8c26dd1 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t -- join in CTE EXPLAIN (VERBOSE, COSTS OFF) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; QUERY PLAN - Limit @@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 Output: t.c1_1, t.c2_1, t.c1_3 (12 rows) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; c1_1 | c2_1 --+-- 101 | 101 diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index f438165..56602a1 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE; -- join in CTE EXPLAIN (VERBOSE, COSTS OFF) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; -- ctid with whole-row reference EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; diff --git a/doc/src/sgml/queries.sgml
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On 2019-Jan-25, Robert Haas wrote: > I finally gotten a little more time to work on this. It took me a > while to understand that a PartitionedRelPruneInfos assumes that the > indexes of partitions in the PartitionDesc don't change between > planning and execution, because subplan_map[] and subplan_map[] are > indexed by PartitionDesc offset. Right, the planner/executor "disconnect" is one of the challenges, and why I was trying to keep the old copy of the PartitionDesc around instead of building updated ones as needed. > I suppose the reason for this is so > that we don't have to go to the expense of copying the partition > bounds from the PartitionDesc into the final plan, but it somehow > seems a little scary to me. Perhaps I am too easily frightened, but > it's certainly a problem from the point of view of this project, which > wants to let the PartitionDesc change concurrently. Well, my definition of the problem started with the assumption that we would keep the partition array indexes unchanged, so "change concurrently" is what we needed to avoid. Yes, I realize that you've opted to change that definition. I may have forgotten some of your earlier emails on this, but one aspect (possibly a key one) is that I'm not sure we really need to cope, other than with an ERROR, with queries that continue to run across an attach/detach -- moreso in absurd scenarios such as the ones you described where the detached table is later re-attached, possibly to a different partitioned table. I mean, if we can just detect the case and raise an error, and this let us make it all work reasonably, that might be better. > I wrote a little patch that stores the relation OIDs of the partitions > into the PartitionedPruneRelInfo and then, at execution time, does an > Assert() that what it gets matches what existed at plan time. I > figured that a good start would be to find a test case where this > fails with concurrent DDL allowed, but I haven't so far succeeded in > devising one. To make the Assert() fail, I need to come up with a > case where concurrent DDL has caused the PartitionDesc to be rebuilt > but without causing an update to the plan. If I use prepared queries > inside of a transaction block, [...] > I also had the idea of trying to use a cursor, because if I could > start execution of a query, [...] Those are the ways I thought of, and the reason for the shape of some of those .spec tests. I wasn't able to hit the situation. > Maybe if I try it with CLOBBER_CACHE_ALWAYS... I didn't try this one. > Anyway, I think this idea of passing a list of relation OIDs that we > saw at planning time through to the executor and cross-checking them > might have some legs. If we only allowed concurrently *adding* > partitions and not concurrently *removing* them, then even if we find > the case(s) where the PartitionDesc can change under us, we can > probably just adjust subplan_map and subpart_map to compensate, since > we can iterate through the old and new arrays of relation OIDs and > just figure out which things have shifted to higher indexes in the > PartitionDesc. This is all kind of hand-waving at the moment; tips > appreciated. I think detaching partitions concurrently is a necessary part of this feature, so I would prefer not to go with a solution that works for attaching partitions but not for detaching them. That said, I don't see why it's impossible to adjust the partition maps in both cases. But I don't have anything better than hand-waving ATM. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: crosstab/repivot...any interest?
On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > Hackers, > > We have a strong need to make a variant to the crosstab interface so > that data that is pivoted one way would be sent through a crosstab > like function so that it would be pivoted another way. For example, > if you had > > row 0: a1, a2, a3, k1, c1, c2, ... > row 1: a1, a2, a3, k2, c1, c2, ... > row 2: a1, a2, a3, k3, c1, c2, ... > ... > > where 'a' columns are uninteresting attribute columns, 'k' is the > dimension we want to pivot on, and c1->cN would be stacked vertically, > so that we'd end up with, > row 0: a1, a2, a3, c1, k1, k2, ... > row 1: a1, a2, a3, c2, k1, k2, ... > row 2: a1, a2, a3, c3, k1, k2, ... > > There are various SQL level approaches to this but they tend to be > imperformant with large datasets so that I think a crosstab-like C > implementation ought to be able to do better (or at least I hope so) > since you have to cross product rows and columns in such a way that > you can get a clean join. Cribbing from tablefunc.c I don't think > this is a terrible challenge to do in hash table style. > > Questions on the table: > *) Has anyone done anything like this or know of any current implementations? > *) Would there be any interest in expanding tablefunc along these lines? There's something in SQL:2016 that I read as crosstabs, or at least as enabling crosstabs. https://www.iso.org/standard/69776.html If we're going to put work into crosstabs, it seems to me that the "we" needs to be the project as a whole, and the work should be, to the extent reasonable, toward standard compliance. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera wrote: > Namely: how does this handle the case of partition pruning structure > being passed from planner to executor, if an attach happens in the > middle of it and puts a partition in between existing partitions? Array > indexes of any partitions that appear later in the partition descriptor > will change. I finally gotten a little more time to work on this. It took me a while to understand that a PartitionedRelPruneInfos assumes that the indexes of partitions in the PartitionDesc don't change between planning and execution, because subplan_map[] and subplan_map[] are indexed by PartitionDesc offset. I suppose the reason for this is so that we don't have to go to the expense of copying the partition bounds from the PartitionDesc into the final plan, but it somehow seems a little scary to me. Perhaps I am too easily frightened, but it's certainly a problem from the point of view of this project, which wants to let the PartitionDesc change concurrently. I wrote a little patch that stores the relation OIDs of the partitions into the PartitionedPruneRelInfo and then, at execution time, does an Assert() that what it gets matches what existed at plan time. I figured that a good start would be to find a test case where this fails with concurrent DDL allowed, but I haven't so far succeeded in devising one. To make the Assert() fail, I need to come up with a case where concurrent DDL has caused the PartitionDesc to be rebuilt but without causing an update to the plan. If I use prepared queries inside of a transaction block, I can continue to run old plans after concurrent DDL has changed things, but I can't actually make the Assert() fail, because the queries continue to use the old plans precisely because they haven't processed invalidation messages, and therefore they also have the old PartitionDesc and everything works. Maybe if I try it with CLOBBER_CACHE_ALWAYS... I also had the idea of trying to use a cursor, because if I could start execution of a query, then force a relcache rebuild, then continue executing the query, maybe something would blow up somehow. But that's not so easy because I don't think we have any way using SQL to declare a cursor for a prepared query, so how do I need to get a query plan that involves run-time pruning without using parameters, which I'm pretty sure is possible but I haven't figured it out yet. And even there the PartitionDirectory concept might preserve us from any damage if the change happens after the executor is initialized, though I'm not sure if there are any cases where we don't do the first PartitionDesc lookup for a particular table until mid-execution. Anyway, I think this idea of passing a list of relation OIDs that we saw at planning time through to the executor and cross-checking them might have some legs. If we only allowed concurrently *adding* partitions and not concurrently *removing* them, then even if we find the case(s) where the PartitionDesc can change under us, we can probably just adjust subplan_map and subpart_map to compensate, since we can iterate through the old and new arrays of relation OIDs and just figure out which things have shifted to higher indexes in the PartitionDesc. This is all kind of hand-waving at the moment; tips appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
crosstab/repivot...any interest?
Hackers, We have a strong need to make a variant to the crosstab interface so that data that is pivoted one way would be sent through a crosstab like function so that it would be pivoted another way. For example, if you had row 0: a1, a2, a3, k1, c1, c2, ... row 1: a1, a2, a3, k2, c1, c2, ... row 2: a1, a2, a3, k3, c1, c2, ... ... where 'a' columns are uninteresting attribute columns, 'k' is the dimension we want to pivot on, and c1->cN would be stacked vertically, so that we'd end up with, row 0: a1, a2, a3, c1, k1, k2, ... row 1: a1, a2, a3, c2, k1, k2, ... row 2: a1, a2, a3, c3, k1, k2, ... There are various SQL level approaches to this but they tend to be imperformant with large datasets so that I think a crosstab-like C implementation ought to be able to do better (or at least I hope so) since you have to cross product rows and columns in such a way that you can get a clean join. Cribbing from tablefunc.c I don't think this is a terrible challenge to do in hash table style. Questions on the table: *) Has anyone done anything like this or know of any current implementations? *) Would there be any interest in expanding tablefunc along these lines? thanks in advance, merlin
Re: move hash_any to utils/hash/hashfn.c
Andres Freund writes: > On 2019-01-25 16:35:18 -0300, Alvaro Herrera wrote: >> Would anybody object too hard if I move hash_any() and friends to >> src/backend/utils/hash/hashfn.c and the declarations to >> src/include/utils/hashutils.h? > I hate the current split quite a bit, albeit for somewhat different > reasons. We make things like tag_hash, uint32_hash unnecessarily more > expensive by adding an entirely useless external function call. And > some of these can be fairly hot (e.g. for syscache). So yea, let's > move this stuff together. +1 regards, tom lane
Re: move hash_any to utils/hash/hashfn.c
Hi, On 2019-01-25 16:35:18 -0300, Alvaro Herrera wrote: > I just noticed (once again) that we have hash_any() declared in > src/access/hash.h (the header file for the hash index AM) rather than > somewhere in utils/, for no good reason other than perhaps there was no > better place when it was introduced. This means that some files that > seem to just need hash_any end up pointlessly #including the whole AM > world upon themselves. > > Would anybody object too hard if I move hash_any() and friends to > src/backend/utils/hash/hashfn.c and the declarations to > src/include/utils/hashutils.h? I hate the current split quite a bit, albeit for somewhat different reasons. We make things like tag_hash, uint32_hash unnecessarily more expensive by adding an entirely useless external function call. And some of these can be fairly hot (e.g. for syscache). So yea, let's move this stuff together. Greetings, Andres Freund
move hash_any to utils/hash/hashfn.c
I just noticed (once again) that we have hash_any() declared in src/access/hash.h (the header file for the hash index AM) rather than somewhere in utils/, for no good reason other than perhaps there was no better place when it was introduced. This means that some files that seem to just need hash_any end up pointlessly #including the whole AM world upon themselves. Would anybody object too hard if I move hash_any() and friends to src/backend/utils/hash/hashfn.c and the declarations to src/include/utils/hashutils.h? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On Fri, Jan 25, 2019 at 08:14:19AM +, Tsunakawa, Takayuki wrote: > Hi Horiguchi-san, Bruce, > > From: Bruce Momjian [mailto:br...@momjian.us] > > I suggest you go with just syscache_prune_min_age, get that into > > PG 12, and we can then reevaluate what we need. If you want to > > hard-code a minimum cache size where no pruning will happen, maybe > > based on the system catalogs or typical load, that is fine. > > Please forgive me if I say something silly (I might have got lost.) > > Are you suggesting to make the cache size limit system-defined and > uncontrollable by the user? I think it's necessary for the DBA to > be able to control the cache memory amount. Otherwise, if many > concurrent connections access many partitions within a not-so-long > duration, then the cache eviction can't catch up and ends up in OOM. > How about the following questions I asked in my previous mail? > > -- > This is a pure question. How can we answer these questions from > users? > > * What value can I set to cache_memory_target when I can use 10 GB for > * the caches and max_connections = 100? How much RAM do I need to > * have for the caches when I set cache_memory_target = 1M? > > The user tends to estimate memory to avoid OOM. Well, let's walk through this. Suppose the default for syscache_prune_min_age is 10 minutes, and that we prune all cache entries unreferenced in the past 10 minutes, or we only prune every 10 minutes if the cache size is larger than some fixed size like 100. So, when would you change syscache_prune_min_age? If you reference many objects and then don't reference them at all for minutes, you might want to lower syscache_prune_min_age to maybe 1 minute. Why would you want to change the behavior of removing all unreferenced cache items, at least when there are more than 100? (You called this syscache_memory_target.) My point is I can see someone wanting to change syscache_prune_min_age, but I can't see someone wanting to change syscache_memory_target. Who would want to keep 5k cache entries that have not been accessed in X minutes? If we had some global resource manager that would allow you to control work_mem, maintenance_work_mem, cache size, and set global limits on their sizes, I can see where maybe it might make sense, but right now the memory usage of a backend is so fluid that setting some limit on its size for unreferenced entries just doesn't make sense. One of my big points is that syscache_memory_target doesn't even guarantee that the cache will be this size or lower, it only controls whether the cleanup happens at syscache_prune_min_age intervals. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: using expression syntax for partition bounds
On Sat, Jan 26, 2019 at 12:19 AM Tom Lane wrote: > Peter Eisentraut writes: > > committed > > Some of the buildfarm members are having sort-ordering problems > with this. Looks like you could work around it with different > partition names (don't assume the relative sort order of > letters and digits). How about replacing \d+ list_parted with couple of \d on individual partitions, like in the attached? Thanks, Amit 7c079d7417-stabilize-test-output.patch Description: Binary data
Re: Use zero for nullness estimates of system attributes
Jim Finnerty writes: > Tom, there's an analogous issue of adjusting distinct values on a per-column > basis based on the selectivity of other local predicates. Several > commercial RDBMS's make such adjustments in an effort to get better > selectivity estimates when there are multiple local predicates. Is this > something that the PG community has considered and decided not to do because > of the additional planning-time overhead that may be required, or just > something that hasn't been discussed or tackled yet? I think what you're talking about is the correlated-variables problem, which we have made a start on with the "extended statistics" mechanism, though certainly a lot remains to be done. regards, tom lane
Re: Alternative to \copy in psql modelled after \g
Fabien COELHO writes: > Fixing the problem envolves deciding what is the right behavior, and > update the documentation and the implementation accordingly. Currently the > documentation suggests that :ERROR is about SQL error (subject to > interpretation), and the implementation is more or less consistent with > that view, but not always, as pointed out by Daniel. FWIW, I think we should adopt the rule that :ERROR reflects any error condition, whether server-side or client-side. It's unlikely that scripts would really not care about client-side errors. Moreover, I do not think we can reliably tell the difference; there are always going to be corner cases that are hard to classify. Example: if we lose the server connection, is that a server-side error or client-side? Or maybe neither, maybe the network went down. Because of this concern, I find the idea of inventing separate SQL_ERROR and CLIENT_ERROR variables to be a pretty terrible one. I think we'd be creating a lot of make-work and hard choices for ourselves, to do something that most script writers won't give a fig about. If you've got subtle error-handling logic in mind, you shouldn't be trying to implement your app in a psql script in the first place, IMO anyway. It's unclear to me whether to push ahead with Daniel's existing patch or not. It doesn't look to me like it's making things any worse from the error-consistency standpoint than they were already, so I'd be inclined to consider error semantics cleanup as something to be done separately/later. BTW, it looks to me like the last patch is still not right as far as when to close copystream --- won't it incorrectly close a stream obtained from pset.copyStream? While you could fix that with - if (pset.gfname && copystream != NULL) + if (!pset.copyStream && pset.gfname && copystream != NULL) I think that's getting overly complex and unmaintainable. I'd be inclined to code things more like FILE *copystream = NULL; boolneed_close = false; ... need_close = openQueryOutputFile(...); ... if (need_close) // close copystream here regards, tom lane
Re: Connection slots reserved for replication
Hello, > On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI > wrote: Thank you for the review.I took a liberty to address it with v9. > > Documentation looks fine for me. By the way, a comment for the > caller-site of CheckRequreParameterValues() in xlog.c looks > somewhat stale. > >> /* Check to see if any changes to max_connections give problems */ >> CheckRequiredParameterValues(); > > may be better something like this?: > >> /* Check to see if any parameter change gives a problem on replication */ I changed it to "Check if any parameter change gives a problem on recovery”, as I think it is independent of the [streaming] replication, but I still don’t like the wording, as it just duplicate the comment at the definition of CheckRequiredParameterValues. Maybe remove the comment altogether? > > > In postinit.c: >> /* >>* The last few connection slots are reserved for superusers. >>*/ >> if ((!am_superuser && !am_walsender) && >> ReservedBackends > 0 && > > This is forgetting about explaing about walsenders. > >> The last few connection slots are reserved for >> superusers. Replication connections don't share the same slot >> pool. > > Or something? I changed it to +* The last few connection slots are reserved for superusers. +* Replication connections are drawn from a separate pool and +* not limited by max_connections or superuser_reserved_connections. > > And the parentheses enclosing "!am_superuser..walsender" seems no > longer needed. > > > In guc.c: > - /* see max_connections and max_wal_senders */ > + /* see max_connections */ > > I don't understand for what reason we should see max_connections > just above. (Or even max_wal_senders) Do we need this comment? > (There're some other instances, but they wont'be for this patch.) I don’t understand what is it pointing to as well, so I’ve removed it. > > > In pg_controldata.c: > + printf(_("max_wal_senders setting: %d\n"), > +ControlFile->max_wal_senders); > printf(_("max_worker_processes setting: %d\n"), > ControlFile->max_worker_processes); > printf(_("max_prepared_xacts setting: %d\n"), > > The added item seems to want some additional spaces. Good catch, fixed. Attached is v9. I also bumped up the PG_CONTROL_VERSION to 1200 per the prior comment by Robert. Cheers, Oleksii replication_reserved_connections_v9.patch Description: Binary data
Re: pg_upgrade: Pass -j down to vacuumdb
Alvaro Herrera writes: > So let's have it write with a $VACUUMDB_OPTS variable, which is by > default defined as empty but with a comment suggesting that maybe the > user wants to add the -j option. This way, if they have to edit it, > they only have to edit the VACUUMDB_OPTS line instead of each of the two > vacuumdb lines. Even better, allow the script to absorb a value from the environment, so that it needn't be edited at all. regards, tom lane
Re: pg_upgrade: Pass -j down to vacuumdb
On 2019-Jan-25, Peter Eisentraut wrote: > On 25/01/2019 11:28, Michael Paquier wrote: > > based on that linking the value used by pg_upgrade and vacuumdb is a > > bad concept in my opinion, and the patch should be rejected. More > > documentation on pg_upgrade side to explain that a bit better could be > > a good idea though, as it is perfectly possible to use your own > > post-upgrade script or rewrite partially the generated one. > > Right. pg_upgrade doesn't actually call vacuumdb. It creates a script > that you may use. The script itself contains a comment that says, if > you want to do this as fast as possible, don't use this script. That > comment could be enhanced to suggest the use of the -j option. So let's have it write with a $VACUUMDB_OPTS variable, which is by default defined as empty but with a comment suggesting that maybe the user wants to add the -j option. This way, if they have to edit it, they only have to edit the VACUUMDB_OPTS line instead of each of the two vacuumdb lines. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
Eyeballing 0001, it has a few problems. 1. It's under-parenthesizing the txn argument of the macros. 2. the "has"/"is" macro definitions don't return booleans -- see fce4609d5e5b. 3. the remainder of this no longer makes sense: /* Do we know this is a subxact? Xid of top-level txn if so */ - boolis_known_as_subxact; TransactionId toplevel_xid; I suggest to fix the comment, and also improve the comment next to the macro that tests this flag. (4. the macro names are ugly.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Use zero for nullness estimates of system attributes
Edmund Horner writes: > I added some code to selfuncs.c to estimate the selectivity of CTID, > including nullness, in my ongoing attempt to add TID range scans [1]. And > as Tom pointed out [2], no system attribute can be null, so we might as > well handle them all. > That's what the attached patch does. Seems pretty uncontroversial, so pushed. > I observed a few interesting things with outer join selectivity: > While system attributes aren't NULL in the table, they can be in queries > such as: Yeah, none of our selectivity calculations account for the possibility that we're above a join that has affected the distribution of a Var's values. Going to NULL in an outer join is just part of that issue. I don't feel this patch needs to solve it, and anyway it'd be a rather massive rethink. > Finally: I thought about introducing a macro to attnum.h: > #define AttrNumberIsForSystemAttr(attributeNumber) \ > ((bool) ((attributeNumber) < 0)) > But there's a zillion places that could be changed to use it, so I haven't > in this version of the patch. I can't get too excited about that. Even if the reader is unfamiliar with the negative-attno convention, most of these places are commented in a way that makes it clear what's going on. regards, tom lane
Re: Alternative to \copy in psql modelled after \g
Bonsoir Daniel, Sure. As there are several bugs (doubtful features) uncovered, a first patch could fix "COPY ...TO STDOUT \g file", but probably replicate ERROR current behavior however debatable it is (i.e. your patch without the ERROR change, which is unrelated to the bug being fixed), and then another patch should fix/modify the behavior around ERROR (everywhere and consistently), and probably IMHO add an SQL_ERROR. It's not as if the patch issued an explicit call SetVariable("ERROR", "true") that could be commented, or something like that. The assignment of the variable happens as a consequence of patched code that aims at being correct in its error handling. So I'm for leaving this decision to a maintainer, because I don't agree with your conclusion that the current patch should be changed in that regard. Ok, fine with me. To summarize the debate, when fixing "\g filename" for "COPY TO STDOUT" the patch does not set error consistently on: sql> COPY (SELECT 1) TO STDOUT \g /BAD # Permission denied on "/BAD" -> :ERROR is true # the command is executed but could not write to the file compared to the existing behavior: sql> SELECT 1 \g BAD # Permission denied on "/BAD" -> :ERROR is false # the SQL command is fine, although writing to the file failed Should we include this inconsistency and then fix the problem later, or replicate the initial (doubtful?) behavior for consistency and then fix the problem later, anyway? Fixing the problem envolves deciding what is the right behavior, and update the documentation and the implementation accordingly. Currently the documentation suggests that :ERROR is about SQL error (subject to interpretation), and the implementation is more or less consistent with that view, but not always, as pointed out by Daniel. Note that as the author of the initial patch adding :ERROR and others (69835bc8), I'm probably somehow responsible for this situation, so shame on me. -- Fabien.
Re: Simplify set of flags used by MyXactFlags
On 2019-Jan-25, Michael Paquier wrote: > On Thu, Jan 24, 2019 at 08:56:05AM -0300, Alvaro Herrera wrote: > > Uh, I didn't think it was necessary nor desirable to rename the flag, > > only the user-visible message. > > Oh, OK. I have overstated your comment then. Are you fine with the > attached instead? The flag name remains the same, and the comment is > updated. I am, except that the change of "table" to "object" in xact.c line 2262 makes the new paragraph read a bit weird -- it's now saying "if we've added a temp object ... Other objects have the same problem". Doesn't make sense. If you keep that word as "table", it works fine. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: insensitive collations
On 16/01/2019 21:50, Peter Eisentraut wrote: > On 16/01/2019 14:20, Daniel Verite wrote: >> I've found another issue with aggregates over distinct: >> the deduplication seems to ignore the collation. > > I have a fix for that. I'll send it with the next update. Another patch. This fixes your issue, and it incorporates the findings from the thread "ExecBuildGroupingEqual versus collations", as well as a few other fixes and more tests. As far as I can tell, this covers everything now, meaning all the relevant plan types propagate the collation correctly and all the relevant operators and functions do the right things with them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 1d675c51fb40a23b355ad5aa3de382e9fd4ffc82 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 25 Jan 2019 16:13:00 +0100 Subject: [PATCH v4] Collations with nondeterministic comparison This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). --- contrib/bloom/bloom.h | 1 + contrib/bloom/blutils.c | 3 +- doc/src/sgml/catalogs.sgml| 7 + doc/src/sgml/charset.sgml | 11 +- doc/src/sgml/func.sgml| 6 + doc/src/sgml/ref/create_collation.sgml| 22 + src/backend/access/hash/hashfunc.c| 86 +++ src/backend/catalog/pg_collation.c| 2 + src/backend/commands/collationcmds.c | 25 +- src/backend/executor/execExpr.c | 4 +- src/backend/executor/execGrouping.c | 12 +- src/backend/executor/execPartition.c | 1 + src/backend/executor/execReplication.c| 5 +- src/backend/executor/nodeAgg.c| 9 +- src/backend/executor/nodeGroup.c | 1 + src/backend/executor/nodeHash.c | 14 +- src/backend/executor/nodeHashjoin.c | 5 + src/backend/executor/nodeRecursiveunion.c | 1 + src/backend/executor/nodeSetOp.c | 2 + src/backend/executor/nodeSubplan.c| 14 +- src/backend/executor/nodeUnique.c | 1 + src/backend/executor/nodeWindowAgg.c | 2 + src/backend/nodes/copyfuncs.c | 7 + src/backend/nodes/outfuncs.c | 7 + src/backend/nodes/readfuncs.c | 7 + src/backend/optimizer/plan/createplan.c | 54 +- src/backend/optimizer/util/tlist.c| 25 + src/backend/partitioning/partbounds.c | 4 +- src/backend/partitioning/partprune.c | 3 +- src/backend/regex/regc_pg_locale.c| 5 + src/backend/utils/adt/arrayfuncs.c| 2 +- src/backend/utils/adt/like.c | 27 +- src/backend/utils/adt/name.c | 32 +- src/backend/utils/adt/orderedsetaggs.c| 3 +- src/backend/utils/adt/pg_locale.c | 1 + src/backend/utils/adt/ri_triggers.c | 39 +- src/backend/utils/adt/varchar.c | 117 +++ src/backend/utils/adt/varlena.c | 166 +++-- src/backend/utils/cache/catcache.c| 9 +- src/backend/utils/cache/lsyscache.c | 16 + src/bin/initdb/initdb.c | 4 +- src/bin/pg_dump/pg_dump.c | 39 +- src/bin/psql/describe.c | 17 +- src/include/catalog/pg_collation.h| 2 + src/include/executor/executor.h | 3 + src/include/executor/hashjoin.h | 1 + src/include/executor/nodeHash.h | 2 +- src/include/nodes/execnodes.h | 3 + src/include/nodes/plannodes.h | 7 + src/include/optimizer/planmain.h | 2 +- src/include/optimizer/tlist.h | 1 + src/include/partitioning/partbounds.h | 1 + src/include/utils/lsyscache.h | 1 + src/include/utils/pg_locale.h | 1 + .../regress/expected/collate.icu.utf8.out | 702 +- .../regress/expected/collate.linux.utf8.out | 25 +- src/test/regress/expected/collate.out | 15 + src/test/regress/expected/subselect.out | 19 + src/test/regress/sql/collate.icu.utf8.sql | 230 ++ src/test/regress/sql/collate.linux.utf8.sql | 8 + src/test/regress/sql/collate.sql | 5 + src/test/regress/sql/subselect.sql| 17 + src/test/subscription/Makefile| 2 +
Re: using expression syntax for partition bounds
Peter Eisentraut writes: > committed Some of the buildfarm members are having sort-ordering problems with this. Looks like you could work around it with different partition names (don't assume the relative sort order of letters and digits). regards, tom lane
Re: House style for DocBook documentation?
On 01/25/19 09:01, Peter Eisentraut wrote: > On 19/01/2019 21:24, Chapman Flack wrote: >> (thinks to self half-seriously about an XSL transform for generating >> printed output that could preserve link-texted links, add raised numbers, >> and produce a numbered URLs section at the back) > > External links already create footnotes in the PDF output. Is that > different from what you are saying? That might, only, indicate that I was just thinking aloud in email and had not gone and checked in PDF output to see how the links were handled. Yes, it could very possibly indicate that. If they are already processed that way, does that mean the o Do not use text with so the URL appears in printed output in README.links should be considered obsolete, and removed even, and doc authors should feel free to put link text in without hesitation? Regards, -Chap
Re: Speeding up text_position_next with multibyte encodings
On 15/01/2019 02:52, John Naylor wrote: The majority of cases are measurably faster, and the best case is at least 20x faster. On the whole I'd say this patch is a performance win even without further optimization. I'm marking it ready for committer. I read through the patch one more time, tweaked the comments a little bit, and committed. Thanks for the review! I did a little profiling of the worst case, where this is slower than the old approach. There's a lot of function call overhead coming from walking the string with pg_mblen(). That could be improved. If we inlined pg_mblen() into loop, it becomes much faster, and I think this code would be faster even in the worst case. (Except for the very worst cases, where hash table with the new code happens to have a collision at a different point than before, but that doesn't seem like a fair comparison.) I think this is good enough as it is, but if I have the time, I'm going to try optimizing the pg_mblen() loop, as well as similar loops e.g. in pg_mbstrlen(). Or if someone else wants to give that a go, feel free. - Heikki
Re: [HACKERS] logical decoding of two-phase transactions
On 14/01/2019 23:16, Arseny Sher wrote: > > Nikhil Sontakke writes: > >> I'd like to believe that the latest patch set tries to address some >> (if not all) of your concerns. Can you please take a look and let me >> know? > > Hi, sure. > > General things: > > - Earlier I said that there is no point of sending COMMIT PREPARED if > decoding snapshot became consistent after PREPARE, i.e. PREPARE hadn't > been sent. I realized since then that such use cases actually exist: > prepare might be copied to the replica by e.g. basebackup or something > else earlier. Basebackup does not copy slots though and slot should not reach consistency until all prepared transactions are committed no? > > - BTW, ReorderBufferProcessXid at PREPARE should be always called > anyway, because otherwise if xact is empty, we will not prepare it > (and call cb), even if the output plugin asked us not to filter it > out. However, we will call commit_prepared cb, which is inconsistent. > > - I find it weird that in DecodePrepare and in DecodeCommit you always > ask the plugin whether to filter an xact, given that sometimes you > know beforehand that you are not going to replay it: it might have > already been replayed, might have wrong dbid, origin, etc. One > consequence of this: imagine that notorious xact with PREPARE before > point where snapshot became consistent and COMMIT PREPARED after that > point. Even if filter_cb says 'I want 2PC on this xact', with current > code it won't be replayed on PREPARE and rbxid will be destroyed with > ReorderBufferForget. Now this xact is lost. Yeah this is wrong. > > Second patch: > > + /* filter_prepare is optional, but requires two-phase decoding */ > + if ((ctx->callbacks.filter_prepare_cb != NULL) && > (!opt->enable_twophase)) > + ereport(ERROR, > + (errmsg("Output plugin does not support > two-phase decoding, but " > + "registered filter_prepared > callback."))); > > I actually think that enable_twophase output plugin option is > redundant. If plugin author wants 2PC, he just provides > filter_prepare_cb callback and potentially others. +1 > I also don't see much > value in checking that exactly 0 or 3 callbacks were registred. > I think that check makes sense, if you support 2pc you need to register all callbacks. > > Nitpicking: > > First patch: I still don't think that these flags need a bitmask. Since we are discussing this, I personally prefer the bitmask here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pg_basebackup, walreceiver and wal_sender_timeout
Hi, hackers. When running pg_basebackup with -X s with network file system as target we would consistently get "could not receive data from WAL stream: server closed the connection unexpectedly". On server we see this error firing: "terminating walsender process due to replication timeout" The problem occurs during a network or file system acting very slow. One example of such case looks like this (strace output for fsync calls): 0.033383 fsync(8) = 0 <20.265275> 20.265399 fsync(8) = 0 <0.11> 0.022892 fsync(7) = 0 <48.350566> 48.350654 fsync(7) = 0 <0.05> 0.000674 fsync(8) = 0 <0.851536> 0.851619 fsync(8) = 0 <0.07> 0.67 fsync(7) = 0 <0.06> 0.45 fsync(7) = 0 <0.05> 0.031733 fsync(8) = 0 <0.826957> 0.827869 fsync(8) = 0 <0.16> 0.005344 fsync(7) = 0 <1.437103> 1.446450 fsync(6) = 0 <0.063148> 0.063246 fsync(6) = 0 <0.06> 0.000381 +++ exited with 1 +++ So the longest fsync call took 48 seconds, but how would that result in a termination if wal_sender_timeout is (default) 60 seconds? The problem is in the way wal_sender handles this timeout: /* * If half of wal_sender_timeout has lapsed without receiving any reply * from the standby, send a keep-alive message to the standby requesting * an immediate reply. */ Obviously the receiver cannot respond immediately while in a syscall. This begs a question, why is the GUC handled the way it is? What would be the correct way to solve this? Shall we change the behaviour or do a better job explaining what are implications of wal_sender_timeout in the docs? Regards, Nick.
Re: [HACKERS] logical decoding of two-phase transactions
Hi, I think the difference between abort and abort prepared should be explained better (I am not quite sure I get it myself). > + The required abort_cb callback is called whenever Also, why is this one required when all the 2pc stuff is optional? > +static void > +DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, > + xl_xact_parsed_prepare * parsed) > +{ > + XLogRecPtr origin_lsn = parsed->origin_lsn; > + TimestampTz commit_time = parsed->origin_timestamp; > + XLogRecPtr origin_id = XLogRecGetOrigin(buf->record); > + TransactionId xid = parsed->twophase_xid; > + boolskip; > + > + Assert(parsed->dbId != InvalidOid); > + Assert(TransactionIdIsValid(parsed->twophase_xid)); > + > + /* Whether or not this PREPARE needs to be skipped. */ > + skip = DecodeEndOfTxn(ctx, buf, parsed, xid); > + > + FinalizeTxnDecoding(ctx, buf, parsed, xid, skip); Given that DecodeEndOfTxn calls SnapBuildCommitTxn, won't this make the catalog changes done by prepared transaction visible to other transactions (which is undesirable as they should only be visible after it's committed) ? > + if (unlikely(TransactionIdIsValid(CheckXidAlive) && > + !(IsCatalogRelation(scan->rs_rd) || > + RelationIsUsedAsCatalogTable(scan->rs_rd > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_TRANSACTION_STATE), > + errmsg("improper heap_getnext call"))); > + I think we should log the relation oid as well so that plugin developers have easier time debugging this (for all variants of this). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: House style for DocBook documentation?
On 19/01/2019 21:24, Chapman Flack wrote: > (thinks to self half-seriously about an XSL transform for generating > printed output that could preserve link-texted links, add raised numbers, > and produce a numbered URLs section at the back) External links already create footnotes in the PDF output. Is that different from what you are saying? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_upgrade: Pass -j down to vacuumdb
Hi Kirk, On 1/24/19 9:31 PM, Jamison, Kirk wrote: According to CF app, this patch needs review so I took a look at it. Currently, your patch applies and builds cleanly, and all tests are also successful based from the CF bot patch tester. I'm not sure if I have understood the argument raised by Peter correctly. Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same -j number should be used." I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum. According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU cores and tablespaces. [1] As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2], which is the max # of concurrent connections to your db server. Thanks for your feedback ! As per Peter's comments I have changed the patch (v2) to not pass down the -j option to vacuumdb. Only an update to the documentation and console output is made in order to make it more clear. Best regards, Jesper
Re: Alternative to \copy in psql modelled after \g
Fabien COELHO wrote: > Sure. As there are several bugs (doubtful features) uncovered, a first > patch could fix "COPY ...TO STDOUT \g file", but probably replicate ERROR > current behavior however debatable it is (i.e. your patch without the > ERROR change, which is unrelated to the bug being fixed), and then another > patch should fix/modify the behavior around ERROR (everywhere and > consistently), and probably IMHO add an SQL_ERROR. It's not as if the patch issued an explicit call SetVariable("ERROR", "true") that could be commented, or something like that. The assignment of the variable happens as a consequence of patched code that aims at being correct in its error handling. So I'm for leaving this decision to a maintainer, because I don't agree with your conclusion that the current patch should be changed in that regard. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Almost bug in COPY FROM processing of GB18030 encoded input
On 24/01/2019 23:27, Robert Haas wrote: On Wed, Jan 23, 2019 at 6:23 AM Heikki Linnakangas wrote: I happened to notice that when CopyReadLineText() calls mblen(), it passes only the first byte of the multi-byte characters. However, pg_gb18030_mblen() looks at the first and the second byte. CopyReadLineText() always passes \0 as the second byte, so pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded characters as 2. It works out fine, though, because the second half of the 4-byte encoded character always looks like another 2-byte encoded character, in GB18030. CopyReadLineText() is looking for delimiter and escape characters and newlines, and only single-byte characters are supported for those, so treating a 4-byte character as two 2-byte characters is harmless. Yikes. Committed the comment changes, so it's less of a gotcha now. - Heikki
Re: backslash-dot quoting in COPY CSV
Bruce Momjian wrote: > but I am able to see the failure using STDIN: > > COPY test FROM STDIN CSV; > Enter data to be copied followed by a newline. > End with a backslash and a period on a line by itself, or an EOF > signal. > "foo > \. > ERROR: unterminated CSV quoted field > CONTEXT: COPY test, line 1: ""foo > > This seems like a bug to me. Looking at the code, psql issues the > prompts for STDIN, but when it sees \. alone on a line, it has no idea > you are in a quoted CSV string, so it thinks the copy is done and sends > the result to the server. I can't see an easy way to fix this. I guess > we could document it. Thanks for looking into this. \copy from file with csv is also affected since it uses COPY FROM STDIN behind the scene. The case of embedded data looks more worrying because psql will execute the data following \. as if they were SQL statements. ISTM that only ON_ERROR_STOP=on prevents the risk of SQL injection in that scenario, but it's off by default. What about this idea: when psql is feeding COPY data from its command stream and an error occurs, it should act as if ON_ERROR_STOP was "on" even if it's not. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: pg_upgrade: Pass -j down to vacuumdb
On 25/01/2019 11:28, Michael Paquier wrote: > based on that linking the value used by pg_upgrade and vacuumdb is a > bad concept in my opinion, and the patch should be rejected. More > documentation on pg_upgrade side to explain that a bit better could be > a good idea though, as it is perfectly possible to use your own > post-upgrade script or rewrite partially the generated one. Right. pg_upgrade doesn't actually call vacuumdb. It creates a script that you may use. The script itself contains a comment that says, if you want to do this as fast as possible, don't use this script. That comment could be enhanced to suggest the use of the -j option. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On 25/01/2019 09:52, Takashi Menjo wrote: > Heikki Linnakangas wrote: >> To re-iterate what I said earlier in this thread, I think the next step >> here is to write a patch that modifies xlog.c to use plain old >> mmap()/msync() to memory-map the WAL files, to replace the WAL buffers. > Sorry but my new patchset still uses PMDK, because PMDK is supported on > Linux > _and Windows_, and I think someone may want to test this patchset on > Windows... When you manage the WAL (or perhaps in the future relation files) through PMDK, is there still a file system view of it somewhere, for browsing, debugging, and for monitoring tools? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
postgres_fdw: estimate_path_cost_size fails to re-use cached costs
Hi, I noticed yet another thing while updating the patch for pushing down ORDER BY LIMIT. Let me explain. When costing foreign paths on the basis of local statistics, we calculate/cache the costs of an unsorted foreign path, and re-use them to estimate the costs of presorted foreign paths, as shown below. BUT: we fail to re-use them for some typical queries, such as "select * from ft1 order by a", due to fpinfo->rel_startup_cost=0, leading to doing the same cost calculation repeatedly. /* * We will come here again and again with different set of pathkeys * that caller wants to cost. We don't need to calculate the cost of * bare scan each time. Instead, use the costs if we have cached them * already. */ if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0) { startup_cost = fpinfo->rel_startup_cost; run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost; } I think we should use "fpinfo->rel_startup_cost >= 0" here, not "fpinfo->rel_startup_cost > 0". Also, it would be possible that the total cost calculated is zero in corner cases (eg, seq_page_cost=0 and cpu_tuple_cost=0 for the example), so I think we should change the total cost part as well. Attached is a patch for that. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 2625,2631 estimate_path_cost_size(PlannerInfo *root, * bare scan each time. Instead, use the costs if we have cached them * already. */ ! if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0) { startup_cost = fpinfo->rel_startup_cost; run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost; --- 2625,2631 * bare scan each time. Instead, use the costs if we have cached them * already. */ ! if (fpinfo->rel_startup_cost >= 0 && fpinfo->rel_total_cost >= 0) { startup_cost = fpinfo->rel_startup_cost; run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS
On 26/12/2018 07:07, Tatsuro Yamada wrote: > +#define Query_for_list_of_attribute_numbers \ > +"SELECT attnum "\ > +" FROM pg_catalog.pg_attribute a, "\ > +" pg_catalog.pg_class c "\ > +" WHERE c.oid = a.attrelid "\ > +" AND a.attnum > 0 "\ > +" AND NOT a.attisdropped "\ > +" /* %d %s */" \ > +" AND a.attrelid = (select oid from pg_catalog.pg_class where relname = > '%s') "\ > +" AND pg_catalog.pg_table_is_visible(c.oid) "\ > +"order by a.attnum asc " This needs a bit of refinement. You need to handle quoted index names (see nearby Query_for_list_of_attributes), and you should also complete partial numbers (e.g., if I type 1 then complete 10, 11, ... if appropriate). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Implicit timezone conversion replicating from timestamp to timestamptz?
On 24/01/2019 21:57, Jeremy Finzel wrote: > The source system has a timestamp stored "at time zone UTC", like this > for 6:00pm Chicago time: > 2019-01-24 20:00:00.00 > > I was *very surprised* to find that replicating above timestamp to > timestamptz actually does so correctly, showing this value in my psql > client on the subscriber: > 2019-01-24 14:00:00.00-06 > > How does it know/why does it assume it knows that the time zone of the > timestamp data type is UTC on the provider given that my clusters are at > America/Chicago? This only works by accident in pglogical because the binary representations of both types are compatible in this sense. You're not really supposed to do that. > I would have actually expected an incorrect conversion > of the data unless I set the timezone to UTC on the way in on the > subscriber via a trigger. > > That is, I was expecting to see this: > 2019-01-24 20:00:00.00-06 This is what you get in built-in logical replication, because it goes via textual representation. To make it do what you actually wanted, set the time zone for the subscription worker to UTC. The way to do that (could be easier) is to create a separate user, use ALTER USER SET timezone = 'UTC', and create the subscription as that user. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: using expression syntax for partition bounds
On 24/01/2019 13:57, Amit Langote wrote: > The if (contain_var_clause(value)) block is new code, but I agree its > ereport should have parser_errposition just like other ereports in that > function. Fixed that in the attached. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_upgrade: Pass -j down to vacuumdb
On Fri, Jan 25, 2019 at 02:31:57AM +, Jamison, Kirk wrote: > I'm not sure if I have understood the argument raised by Peter > correctly. Quoting Peter, "it's not clear that pg_upgrade and > vacuumdb are bound the same way, so it's not a given that the same > -j number should be used." I think it's whether the # jobs for > pg_upgrade is used the same way for parallel vacuum. vacuumdb and pg_upgrade are designed for different purposes and have different properties, hence using a value of -j for one does not necessarily mean that the same value should be used for the other. An ANALYZE is nice because it is non-intrusive for a live production system, and if you begin to pass down a -j argument you may finish by eating more resources that would have been necessary for the same job. For some users perhaps that matters, for some it does not. So based on that linking the value used by pg_upgrade and vacuumdb is a bad concept in my opinion, and the patch should be rejected. More documentation on pg_upgrade side to explain that a bit better could be a good idea though, as it is perfectly possible to use your own post-upgrade script or rewrite partially the generated one. -- Michael signature.asc Description: PGP signature
Re: Simplify set of flags used by MyXactFlags
On Thu, Jan 24, 2019 at 08:56:05AM -0300, Alvaro Herrera wrote: > Uh, I didn't think it was necessary nor desirable to rename the flag, > only the user-visible message. Oh, OK. I have overstated your comment then. Are you fine with the attached instead? The flag name remains the same, and the comment is updated. -- Michael From 2b9bb74106800b9c9b3fc95fe4f141858abbde02 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 25 Jan 2019 19:18:39 +0900 Subject: [PATCH] Simplify 2PC restriction handling for temporary objects There are two flags used to control access to temporary tables and access to the temporary namespace of a session, however the first control flag is actually a concept included in the second. This removes the flag for temporary table tracking, keeping around only the one at namespace level. --- src/backend/access/common/relation.c | 4 ++-- src/backend/access/transam/xact.c | 21 ++ src/backend/commands/lockcmds.c | 2 +- src/backend/commands/tablecmds.c | 2 +- src/include/access/xact.h | 12 +++--- .../expected/test_extensions.out | 2 +- src/test/regress/expected/temp.out| 22 +-- 7 files changed, 26 insertions(+), 39 deletions(-) diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c index beec34f126..41a0051f88 100644 --- a/src/backend/access/common/relation.c +++ b/src/backend/access/common/relation.c @@ -71,7 +71,7 @@ relation_open(Oid relationId, LOCKMODE lockmode) /* Make note that we've accessed a temporary relation */ if (RelationUsesLocalBuffers(r)) - MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL; + MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE; pgstat_initstats(r); @@ -121,7 +121,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode) /* Make note that we've accessed a temporary relation */ if (RelationUsesLocalBuffers(r)) - MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL; + MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE; pgstat_initstats(r); diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 7c3a9c1e89..38434a199e 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -2259,13 +2259,18 @@ PrepareTransaction(void) /* NOTIFY will be handled below */ /* - * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in + * Don't allow PREPARE TRANSACTION if we've accessed a temporary object in * this transaction. Having the prepared xact hold locks on another * backend's temp table seems a bad idea --- for instance it would prevent * the backend from exiting. There are other problems too, such as how to * clean up the source backend's local buffers and ON COMMIT state if the * prepared xact includes a DROP of a temp table. * + * Other objects types, like functions, operators or extensions, share the + * same restriction as they should not be created, locked or dropped as + * this can mess up with this session or even a follow-up session trying + * to use the same temporary namespace. + * * We must check this after executing any ON COMMIT actions, because they * might still access a temp relation. * @@ -2273,22 +2278,10 @@ PrepareTransaction(void) * cases, such as a temp table created and dropped all within the * transaction. That seems to require much more bookkeeping though. */ - if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPREL)) - ereport(ERROR, -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot PREPARE a transaction that has operated on temporary tables"))); - - /* - * Similarly, PREPARE TRANSACTION is not allowed if the temporary - * namespace has been involved in this transaction as we cannot allow it - * to create, lock, or even drop objects within the temporary namespace - * as this can mess up with this session or even a follow-up session - * trying to use the same temporary namespace. - */ if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot PREPARE a transaction that has operated on temporary namespace"))); + errmsg("cannot PREPARE a transaction that has operated on temporary objects"))); /* * Likewise, don't allow PREPARE after pg_export_snapshot. This could be diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index f4da564e01..43bba717f2 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -108,7 +108,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, */ relpersistence = get_rel_persistence(relid); if (relpersistence == RELPERSISTENCE_TEMP) - MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL; + MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE; /* Check permissions. */ aclresult = LockTableAclCheck(relid, lockmode,
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Hello, On behalf of Yoshimi, I rebased the patchset onto the latest master (e3565fd6). Please see the attachment. It also includes an additional bug fix (in patch 0002) about temporary filename. Note that PMDK 1.4.2+ supports MAP_SYNC and MAP_SHARED_VALIDATE flags, so please use a new version of PMDK when you test. The latest version is 1.5. Heikki Linnakangas wrote: > To re-iterate what I said earlier in this thread, I think the next step > here is to write a patch that modifies xlog.c to use plain old > mmap()/msync() to memory-map the WAL files, to replace the WAL buffers. Sorry but my new patchset still uses PMDK, because PMDK is supported on Linux _and Windows_, and I think someone may want to test this patchset on Windows... Regards, Takashi -- Takashi Menjo - NTT Software Innovation Center 0001-Add-configure-option-for-PMDK-v2.patch Description: Binary data 0002-Read-write-WAL-files-using-PMDK-v2.patch Description: Binary data 0003-Walreceiver-WAL-IO-using-PMDK-v2.patch Description: Binary data
Re: proposal - plpgsql unique statement id
čt 24. 1. 2019 v 23:08 odesílatel Tom Lane napsal: > Peter Eisentraut writes: > > committed > > Why didn't this patch modify the dumping logic in pl_funcs.c to print > the IDs? I'm not aware of other cases where we intentionally omit > fields from debug-support printouts. > Currently we don't print lineno, what is maybe for user more important information. I looked to the code, and now I am thinking so it is little bit harder, than I expected. Any new information can break output formatting static void dump_loop(PLpgSQL_stmt_loop *stmt) { dump_ind(); printf("LOOP\n"); dump_stmts(stmt->body); dump_ind(); printf("ENDLOOP\n"); } can looks like static void dump_loop(PLpgSQL_stmt_loop *stmt, int stmtid_width) { dump_ind(); printf("%*d LOOP\n", stmtid_width, stmt->stmtid); dump_stmts(stmt->body); dump_ind(); printf(" ENDLOOP\n"); } It is some what do you expect ? Regards Maybe more simple static void dump_loop(PLpgSQL_stmt_loop *stmt, int stmtid_width) { dump_ind(); printf("LOOP {%d}\n",stmt->stmtid); dump_stmts(stmt->body); dump_ind(); printf(" ENDLOOP\n"); } Pavel > regards, tom lane >
RE: Protect syscache from bloating with negative cache entries
Hi Horiguchi-san, Bruce, From: Bruce Momjian [mailto:br...@momjian.us] > I suggest you go with just syscache_prune_min_age, get that into PG 12, > and we can then reevaluate what we need. If you want to hard-code a > minimum cache size where no pruning will happen, maybe based on the system > catalogs or typical load, that is fine. Please forgive me if I say something silly (I might have got lost.) Are you suggesting to make the cache size limit system-defined and uncontrollable by the user? I think it's necessary for the DBA to be able to control the cache memory amount. Otherwise, if many concurrent connections access many partitions within a not-so-long duration, then the cache eviction can't catch up and ends up in OOM. How about the following questions I asked in my previous mail? -- This is a pure question. How can we answer these questions from users? * What value can I set to cache_memory_target when I can use 10 GB for the caches and max_connections = 100? * How much RAM do I need to have for the caches when I set cache_memory_target = 1M? The user tends to estimate memory to avoid OOM. -- Regards Takayuki Tsunakawa