Re: MERGE ... RETURNING
On Thu, Feb 29, 2024 at 1:49 PM Jeff Davis wrote: > > Can we get some input on whether the current MERGE ... RETURNING patch > is the right approach from a language standpoint? > MERGE_CLAUSE_NUMBER() seems really out of place to me, it feels out of place to identify output set by number rather than some kind of name. Did not see a lot of support for that position though. merlin
Re: Re: How to solve the problem of one backend process crashing and causing other processes to restart?
On Mon, Nov 13, 2023 at 3:14 AM yuansong wrote: > Enhancing the overall fault tolerance of the entire system for this > feature is quite important. No one can avoid bugs, and I don't believe that > this approach is a more advanced one. It might be worth considering adding > it to the roadmap so that interested parties can conduct relevant research. > > The current major issue is that when one process crashes, resetting all > connections has a significant impact on other connections. Is it possible > to only disconnect the crashed connection and have the other connections > simply roll back the current transaction without reconnecting? Perhaps this > problem can be mitigated through the use of a connection pool. > > If we want to implement this feature, would it be sufficient to clean up > or restore the shared memory and disk changes caused by the crashed > backend? Besides clearing various known locks, what else needs to be > changed? Does anyone have any insights? Please help me. Thank you. > One thing that's really key to understand about postgres is that there are a different set of rules regarding what is the database's job to solve vs supporting libraries and frameworks. It isn't that hard to wait and retry a query in most applications, and it is up to you to do that.There are also various connection poolers that might implement retry logic, and not having to work through those concerns keeps the code lean and has other benefits. While postgres might implement things like a built in connection pooler, 'o_direct' type memory management, and things like that, there are long term costs to doing them. There's another side to this. Suppose I had to choose between a hypothetical postgres that had some kind of process local crash recovery and the current implementation. I might still choose the current implementation because, in general, crashes are good, and the full reset has a much better chance of clearing the underlying issue that caused the problem vs managing the symptoms of it. merlin
Re: MERGE ... RETURNING
On Wed, Nov 1, 2023 at 5:12 AM Dean Rasheed wrote: > On Tue, 31 Oct 2023 at 23:19, Vik Fearing wrote: > > > > On 10/31/23 19:28, Jeff Davis wrote: > > > > > Assuming we have one RETURNING clause at the end, then it creates the > > > problem of how to communicate which WHEN clause a tuple came from, > > > whether it's the old or the new version, and/or which action was > > > performed on that tuple. > > > > > > How do we communicate any of those things? We need to get that > > > information into the result table somehow, so it should probably be > > > some kind of expression that can exist in the RETURNING clause. But > > > what kind of expression? > > > > > > (a) It could be a totally new expression kind with a new keyword (or > > > recycling some existing keywords for the same effect, or something that > > > looks superficially like a function call but isn't) that's only valid > > > in the RETURNING clause of a MERGE statement. If you use it in another > > > expression (say the targetlist of a SELECT statement), then you'd get a > > > failure at parse analysis time. > > > > This would be my choice, the same as how the standard GROUPING() > > "function" for grouping sets is implemented by GroupingFunc. > > > > Something I'm wondering about is to what extent this discussion is > driven by concerns about aspects of the implementation (specifically, > references to function OIDs in code), versus a desire for a different > user-visible syntax. To a large extent, those are orthogonal > questions. > > (As an aside, I would note that there are already around a dozen > references to specific function OIDs in the parse analysis code, and a > lot more if you grep more widely across the whole of the backend > code.) > > At one point, as I was writing this patch, I went part-way down the > route of adding a new node type (I think I called it MergeFunc), for > these merge support functions, somewhat inspired by GroupingFunc. In > the end, I backed out of that approach, because it seemed to be > introducing a lot of unnecessary additional complexity, and I decided > that a regular FuncExpr would suffice. > > If pg_merge_action() and pg_merge_when_clause_number() were > implemented using a MergeFunc node, it would reduce the number of > places that refer to specific function OIDs. Basically, a MergeFunc > node would be very much like a FuncExpr node, except that it would > have a "levels up" field, set during parse analysis, at the point > where we check that it is being used in a merge returning clause, and > this field would be used during subselect planning. Note, however, > that that doesn't entirely eliminate references to specific function > OIDs -- the parse analysis code would still do that. Also, additional > special-case code in the executor would be required to handle > MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need > adjusting, and anything else like that. > > A separate question is what the syntax should be. We could invent a > new syntax, like GROUPING(). Perhaps: > > MERGE(ACTION) instead of pg_merge_action() > MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number() > Hm, still struggling with this merge action and (especially) number stuff. Currently we have: WHEN MATCHED [ AND *condition* ] THEN { *merge_update* | *merge_delete* | DO NOTHING } | WHEN NOT MATCHED [ AND *condition* ] THEN { *merge_insert* | DO NOTHING } } What about extending to something like: WHEN MATCHED [ AND *condition* ] [ AS *merge_clause_name ]* WHEN MATCHED AND tid > 2 AS giraffes THEN UPDATE SET balance = t.balance + delta ...and have pg_merge_clause() return 'giraffes' (of name type). If merge clause is not identified, maybe don't return any data for that clause through returning,, or return NULL. Maybe 'returning' clause doesn't have to be extended or molested in any way, it would follow mechanics as per 'update', and could not refer to identified merge_clauses, but would allow for pg_merge_clause() functioning. You wouldn't need to identify action or number. Food for thought, -- may have missed some finer details upthread. for example, with r as ( merge into x using y on x.a = y.a when matched and x.c > 0 as good then do nothing when matched and x.c <= 0 as bad then do nothing returning pg_merge_clause(), x.* ) ... yielding pg_merge_clause a c good1 5 good2 7 bad 3 0 ... ...maybe allow pg_merge_clause() take to optionally yield column name: returning pg_merge_clause('result'), x.* ) ... yielding result a c good 1 5 good 2 7 bad3 0 ... merlin
Re: MERGE ... RETURNING
On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis wrote: > On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote: > > Updated version attached, fixing an uninitialized-variable warning > > from the cfbot. > > I took another look and I'm still not comfortable with the special > IsMergeSupportFunction() functions. I don't object necessarily -- if > someone else wants to commit it, they can -- but I don't plan to commit > it in this form. > > Can we revisit the idea of a per-WHEN RETURNING clause? The returning > clauses could be treated kind of like a UNION, which makes sense > because it really is a union of different results (the returned tuples > from an INSERT are different than the returned tuples from a DELETE). > You can just add constants to the target lists to distinguish which > WHEN clause they came from. > > I know you rejected that approach early on, but perhaps it's worth > discussing further? > Yeah. Side benefit, the 'action_number' felt really out of place, and that neatly might solve it. It doesn't match tg_op, for example. With the current approach, return a text, or an enum? Why doesn't it match concepts that are pretty well established elsewhere? SQL has a pretty good track record for not inventing weird numbers with no real meaning (sadly, not so much the developers). Having said that, pg_merge_action() doesn't feel too bad if the syntax issues can be worked out. Very supportive of the overall goal. merlin
Memory knob testing (was Re: Let's make PostgreSQL multi-threaded)
On Fri, Aug 25, 2023 at 8:35 AM Stephen Frost wrote: > Greetings, > > This is getting a bit far afield in terms of this specific thread, but > there's an ongoing effort to give PG administrators knobs to be able to > control how much actual memory is used rather than depending on the > kernel to actually tell us when we're "out" of memory. There'll be new > patches for the September commitfest posted soon. If you're interested > in this issue, it'd be great to get more folks involved in review and > testing. > Noticed I missed this. I'm interested. Test #1 would be to set memory to about max there is, maybe a hair under, turn off swap, and see what happens in various dynamic load situations. Disabling overcommit is not a practical solution in my experience; it moves instability from one place to another and seems to make problems appear in a broader set of situations. For zero downtime platforms it has place but I would tend to roll the dice on a reboot even for direct user facing applications given that it can provide relief for systemic conditions. My unsophisticated hunch is that postgres and the kernel are not on the same page about memory somehow and that the multi-process architecture might be contributing to that issue. Of course, regarding rearchitecture skeptically and realistically is a good idea given the effort and risks. I guess, in summary, I would personally rate things like better management of resource tradeoffs, better handling of transient dmenands, predictable failure modes, and stability in dynamic workloads over things like better performance in extremely high concurrency situations. Others might think differently for objectively good reasons. merlin
Re: Request for comment on setting binary format output per session
On Wed, Oct 4, 2023 at 9:17 AM Peter Eisentraut wrote: > I think intuitively, this facility ought to work like client_encoding. > There, the client declares its capabilities, and the server has to > format the output according to the client's capabilities. That works, > and it also works through connection poolers. (It is a GUC.) If we can > model it like that as closely as possible, then we have a chance of > getting it working reliably. Notably, the value space for > client_encoding is a globally known fixed list of strings. We need to > figure out what is the right way to globally identify types, like either > by fully-qualified name, by base name, some combination, how does it > work with extensions, or do we need a new mechanism like UUIDs. I think > that is something we need to work out, no matter which protocol > mechanism we end up using. > Fantastic write up. > globally known fixed list of strings Are you suggesting that we would have a client/server negotiation such as, 'jdbc', 'all', etc where that would identify which types are done which way? If you did that, why would we need to promote names/uuid to permanent global space? merlin
Re: dubious warning: FORMAT JSON has no effect for json and jsonb types
On Wed, Aug 16, 2023 at 8:55 AM Peter Eisentraut wrote: > This warning comes from parse_expr.c transformJsonValueExpr() and is > triggered for example by the following test case: > > SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON); > WARNING: FORMAT JSON has no effect for json and jsonb types > > But I don't see anything in the SQL standard that would require this > warning. It seems pretty clear that FORMAT JSON in this case is > implicit and otherwise without effect. > > Also, we don't have that warning in the output case (RETURNING json > FORMAT JSON). > > Anyone remember why this is here? Should we remove it? +1 for removing, on the basis that it is not suprising, and would pollute logs for most configurations. merlin
Re: Let's make PostgreSQL multi-threaded
On Thu, Jul 27, 2023 at 8:28 AM David Geier wrote: > Hi, > > On 6/7/23 23:37, Andres Freund wrote: > > I think we're starting to hit quite a few limits related to the process > model, > > particularly on bigger machines. The overhead of cross-process context > > switches is inherently higher than switching between threads in the same > > process - and my suspicion is that that overhead will continue to > > increase. Once you have a significant number of connections we end up > spending > > a *lot* of time in TLB misses, and that's inherent to the process model, > > because you can't share the TLB across processes. > > Another problem I haven't seen mentioned yet is the excessive kernel > memory usage because every process has its own set of page table entries > (PTEs). Without huge pages the amount of wasted memory can be huge if > shared buffers are big. Hm, noted this upthread, but asking again, does this help/benefit interactions with the operating system make oom kill situations less likely? These things are the bane of my existence, and I'm having a hard time finding a solution that prevents them other than running pgbouncer and lowering max_connections, which adds complexity. I suspect I'm not the only one dealing with this. What's really scary about these situations is they come without warning. Here's a pretty typical example per sar -r. kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 14:20:02 461612 15803476 97.16 0 11120280 12346980 60.35 10017820 4806356 220 14:30:01 378244 15886844 97.67 0 11239012 12296276 60.10 10003540 4909180 240 14:40:01 308632 15956456 98.10 0 11329516 12295892 60.10 10015044 4981784 200 14:50:01 458956 15806132 97.18 0 11383484 12101652 59.15 9853612 5019916 112 15:00:01 10592736 5672352 34.87 0 4446852 8378324 40.95 1602532 3473020 264 <-- reboot! 15:10:01 9151160 7113928 43.74 0 5298184 8968316 43.83 2714936 3725092 124 15:20:01 8629464 7635624 46.94 0 6016936 8777028 42.90 2881044 4102888 148 15:30:01 8467884 7797204 47.94 0 6285856 8653908 42.30 2830572 4323292 436 15:40:02 8077480 8187608 50.34 0 6828240 8482972 41.46 2885416 4671620 320 15:50:01 7683504 8581584 52.76 0 7226132 8511932 41.60 2998752 4958880 308 16:00:01 7239068 9026020 55.49 0 7649948 8496764 41.53 3032140 5358388 232 16:10:01 7030208 9234880 56.78 0 7899512 8461588 41.36 3108692 5492296 216 Triggering query was heavy (maybe even runaway), server load was minimal otherwise: CPU %user %nice %system %iowait%steal %idle 14:30:01all 9.55 0.00 0.63 0.02 0.00 89.81 14:40:01all 9.95 0.00 0.69 0.02 0.00 89.33 14:50:01all 10.22 0.00 0.83 0.02 0.00 88.93 15:00:01all 10.62 0.00 1.63 0.76 0.00 86.99 15:10:01all 8.55 0.00 0.72 0.12 0.00 90.61 The conjecture here is that lots of idle connections make the server appear to have less memory available than it looks, and sudden transient demands can cause it to destabilize. Just throwing it out there, if it can be shown to help it may be supportive of moving forward with something like this, either instead of, or along with, O_DIRECT or other internalized database memory management strategies. Lowering context switches, faster page access etc are of course nice would not be a game changer for the workloads we see which are pretty varied (OLTP, analytics) although we don't extremely high transaction rates. merlin
Re: Let's make PostgreSQL multi-threaded
On Mon, Jun 5, 2023 at 12:25 PM Heikki Linnakangas wrote: > We currently bend over backwards to make all allocations fixed-sized in > shared memory. You learn to live with that, but a lot of things would be > simpler if you could allocate and free in shared memory more freely. > It's no panacea, you still need to be careful with locking and > concurrency. But a lot simpler. Would this help with oom killer in linux? Isn't it true that pgbouncer provides a lot of the same benefits? merlin
Re: Request for comment on setting binary format output per session
On Thu, Apr 20, 2023 at 2:52 PM Dave Cramer wrote: > > As promised here is a patch with defines for all of the protocol messages. > I created a protocol.h file and put it in src/includes > I'm fairly sure that some of the names I used may need to be changed but > the grunt work of finding and replacing everything is done. > In many cases, converting inline character to macro eliminates the need for inline comment, e.g.: + case SIMPLE_QUERY: /* simple query */ ...that's more work obviously, do you agree and if so would you like some help going through that? merlin >
Re: Request for comment on setting binary format output per session
On Mon, Apr 3, 2023 at 11:29 AM Dave Cramer wrote: > > participating clients to receive GUC configured format. I do not > >> > think that libpq's result format being able to be overridden by GUC >>> > is a good idea at all, the library has to to participate, and I >>> > think can be made to so so without adjusting the interface (say, by >>> > resultFormat = 3). >>> >>> Interesting idea. I suppose you'd need to specify 3 for all result >>> columns? That is a protocol change, but wouldn't "break" older clients. >>> The newer clients would need to make sure that they're connecting to >>> v16+, so using the protocol version alone wouldn't be enough. Hmm. >>> >> >> > So this only works with extended protocol and not simple queries. > Again, as Peter mentioned it's already easy enough to confuse psql using > binary cursors so > it makes sense to fix psql either way. > > If you use resultFormat (3) I think you'd still end up doing the Describe > (which we are trying to avoid) to make sure you could receive all the > columns in binary. > Can you elaborate on why Describe would have to be passed? Agreed that would be a dealbreaker if so. If you pass a 3 sending it in, the you'd be checking PQfformat on data coming back as 0/1, or at least that's be smart since you're indicating the server is able to address the format. This addresses the concern libpq clients currently passing resultfomat zero could not have that decision overridden by the server which I also think is a dealbreaker. There might be other reasons why a describe message may be forced however. merlin
Re: Request for comment on setting binary format output per session
On Wed, Mar 29, 2023 at 11:04 AM Jeff Davis wrote: > I'm not clear on what proposal you are making and/or endorsing? > ha -- was just backing up dave's GUC idea. > 1. Fix our own clients, like psql, to check for binary data they can't > process. > This ought to be impossible IMO. All libpq routines except PQexec have an explicit expectation on format (via resultformat parameter) that should not be overridden. PQexec ought to be explicitly documented and wired to only request text format data. resultfomat can be extended now or later to allow participating clients to receive GUC configured format. I do not think that libpq's result format being able to be overridden by GUC is a good idea at all, the library has to to participate, and I think can be made to so so without adjusting the interface (say, by resultFormat = 3). Similarly, in JDBC world, it ought to be up to the driver to determine when it want the server to flex wire formats but must be able to override the server's decision. merlin
Re: Request for comment on setting binary format output per session
On Tue, Mar 21, 2023 at 4:47 PM Jeff Davis wrote: > On Tue, 2023-03-21 at 09:22 -0400, Dave Cramer wrote: > > As Jeff mentioned there is a visibility problem if the search path is > > changed. The simplest solution IMO is to look up the OID at the time > > the format is requested and use the OID going forward to format the > > output as binary. If the search path changes and a type with the same > > name is now first in the search path then the data would be returned > > in text. > > Am I over-thinking this? > I think so. Dave's idea puts a lot of flexibility into the client side, and that's good. Search path mechanics are really well understood and well integrated with extensions already (create extension ..schema) assuming that the precise time UDT are looked up in an unqualified way is very clear to- or invoked via- the client code. I'll say it again though; OIDs really ought to be considered a transient cache of type information rather than a permanent identifier. Regarding UDT, lots of common and useful scenarios (containers, enum, range, etc), do not require special knowledge to parse beyond the kind of type it is. Automatic type creation from tables is one of the most genius things about postgres and directly wiring client structures to them through binary is really nifty. This undermines the case that binary parsing requires special knowledge IMO, UDT might in fact be the main long term target...I could see scenarios where java classes might be glued directly to postgres tables by the driver...this would be a lot more efficient than using json which is how everyone does it today. Then again, maybe *I* might be overthinking this. merlin
Re: Request for comment on setting binary format output per session
On Tue, Mar 21, 2023 at 8:22 AM Dave Cramer wrote: > > On Tue, 21 Mar 2023 at 07:35, Merlin Moncure wrote: > >> >> >> On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer wrote: >> >>> >>> >>> >>> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure wrote: >>> >>>> >>>> >>>> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer >>>> wrote: >>>> >>>>> >>>>> OIDs are a pain to deal with IMO. They will not survive a dump style >>>> restore, and are hard to keep synchronized between databases...type names >>>> don't have this problem. OIDs are an implementation artifact that ought >>>> not need any extra dependency. >>>> >>> AFAIK, OID's for built-in types don't change. >>> Clearly we need more thought on how to deal with UDT's >>> >> >> Yeah. Not having a solution that handles arrays and composites though >> would feel pretty incomplete since they would be the one of the main >> beneficiaries from a performance standpoint. >> > I don't think arrays of built-in types are a problem; drivers already know > how to deal with these. > > >> I guess minimally you'd need to expose some mechanic to look up oids, but >> being able to specify "foo"."bar", in the GUC would be pretty nice (albeit >> a lot more work). >> > > As Jeff mentioned there is a visibility problem if the search path is > changed. > Only if the name is not fully qualified. By allowing OID to bypass visibility, it stands to reason visibility ought to be bypassed for type requests as well, or at least be able to be. If we are setting things in GUC, that suggests we can establish things in postgresql.conf, and oids feel out of place there. Yep. Your rationale is starting to click. How would this interact with >> existing code bases? >> > Actually JDBC wasn't the first to ask for this. Default result formats > should be settable per session · postgresql-interfaces/enhancement-ideas · > Discussion #5 (github.com) > <https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5> > I've > tested it with JDBC and it requires no code changes on our end. Jack tested > it and it required no code changes on his end either. He did some > performance tests and found "At 100 rows the text format takes 48% longer > than the binary format." > https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599 > Yeah, the general need is very clear IMO. > I get that JDBC is the main target, but how does this interact with libpq >> code that explicitly sets resultformat? >> > Honestly I have no idea how it would function with libpq. I presume if the > client did not request binary format then things would work as they do > today. > I see your argument here, but IMO this is another can of nudge away from GUC, unless you're willing to establish that behavior. Thinking here is that the GUC wouldn't do anything for libpq, uses cases, and couldn't, since resultformat would be overriding the behavior in all interesting cases...it seems odd to implement server side specified behavior that the client library doesn't implement. merlin
Re: Request for comment on setting binary format output per session
On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer wrote: > > > > On Mon, 20 Mar 2023 at 19:10, Merlin Moncure wrote: > >> >> >> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer wrote: >> >>> >>> OIDs are a pain to deal with IMO. They will not survive a dump style >> restore, and are hard to keep synchronized between databases...type names >> don't have this problem. OIDs are an implementation artifact that ought >> not need any extra dependency. >> > AFAIK, OID's for built-in types don't change. > Clearly we need more thought on how to deal with UDT's > Yeah. Not having a solution that handles arrays and composites though would feel pretty incomplete since they would be the one of the main beneficiaries from a performance standpoint.I guess minimally you'd need to expose some mechanic to look up oids, but being able to specify "foo"."bar", in the GUC would be pretty nice (albeit a lot more work). > This seems like a protocol or even a driver issue rather than a GUC issue. >> Why does the server need to care what format the client might want to >> prefer on a query by query basis? >> > > Actually this isn't a query by query basis. The point of this is that the > client wants all the results for given OID's in binary. > Yep. Your rationale is starting to click. How would this interact with existing code bases? I get that JDBC is the main target, but how does this interact with libpq code that explicitly sets resultformat? Perhaps the answer should be as it shouldn't change documented behavior, and a hypothetical resultformat=2 could be reserved to default to text but allow for server control, and 3 as the same but default to binary. merlin
Re: Request for comment on setting binary format output per session
On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer wrote: > > Dave Cramer > > > On Sat, 4 Mar 2023 at 19:39, Dave Cramer wrote: > >> >> >> On Sat, 4 Mar 2023 at 19:06, Tom Lane wrote: >> >>> Jeff Davis writes: >>> > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: >>> >> Most of the clients know how to decode the builtin types. I'm not >>> >> sure there is a use case for binary encode types that the clients >>> >> don't have a priori knowledge of. >>> >>> > The client could, in theory, have a priori knowledge of a non-builtin >>> > type. >>> >>> I don't see what's "in theory" about that. There seems plenty of >>> use for binary I/O of, say, PostGIS types. Even for built-in types, >>> do we really want to encourage people to hard-wire their OIDs into >>> applications? >>> >> >> How does a client read these? I'm pretty narrowly focussed. The JDBC API >> doesn't really have a way to read a non built-in type. There is a facility >> to read a UDT, but the user would have to provide that transcoder. I guess >> I'm curious how other clients read binary UDT's ? >> >>> >>> I don't see a big problem with driving this off a GUC, but I think >>> it should be a list of type names not OIDs. We already have plenty >>> of precedent for dealing with that sort of thing; see search_path >>> for the canonical example. IIRC, there's similar caching logic >>> for temp_tablespaces. >>> >> >> I have no issue with allowing names, OID's were compact, but we could >> easily support both >> > > Attached is a preliminary patch that takes a list of OID's. I'd like to > know if this is going in the right direction. > > Next step would be to deal with type names as opposed to OID's. > This will be a bit more challenging as type names are schema specific. > OIDs are a pain to deal with IMO. They will not survive a dump style restore, and are hard to keep synchronized between databases...type names don't have this problem. OIDs are an implementation artifact that ought not need any extra dependency. This seems like a protocol or even a driver issue rather than a GUC issue. Why does the server need to care what format the client might want to prefer on a query by query basis? I just don't see it. The resultformat switch in libpq works pretty well, except that it's "all in" on getting data from the server, with the dead simple workaround of casting to text which might even be able to be managed from within the driver itself. merlin
feature request: IN clause optimized through append nodes with UNION ALL
In the script below, the presence of an IN clause forces the internal components of the UNION ALL clause to fully compute even though they are fully optimizable. = ANY doesn't have this issue, so I wonder if there is any opportunity to convert the 'slow' variant (see below) to the 'fast' variant.thank you! merlin drop table a cascade; drop table b cascade; drop table c cascade; create table a (a_id int primary key); create table b (b_id int primary key, a_id int references a); create table c (c_id int primary key, b_id int references b); insert into a select s from generate_series(1, 5) s; insert into b select s, (s % 5 ) + 1 from generate_series(1, 10) s; insert into c select s, (s % 10 ) + 1 from generate_series(1, 100) s; create index on b (a_id, b_id); create index on c (b_id, c_id); analyze a; analyze b; analyze c; create temp table d (a_id int); insert into d values (99); insert into d values (999); insert into d values (); analyze d; create or replace view v as select * from a join b using(a_id) join c using(b_id) union all select * from a join b using(a_id) join c using(b_id); explain analyze select * from v where a_id in (select a_id from d); -- this is slow explain analyze select * from v where a_id = any(array(select a_id from d)); -- this is fast
Re: pgcon unconference / impact of block size on performance
On Sat, Jun 4, 2022 at 6:23 PM Tomas Vondra wrote: > Hi, > > At on of the pgcon unconference sessions a couple days ago, I presented > a bunch of benchmark results comparing performance with different > data/WAL block size. Most of the OLTP results showed significant gains > (up to 50%) with smaller (4k) data pages. > Wow. Random numbers are fantastic, Significant reduction in sequential throughput is a little painful though, I see 40% reduction in some cases if I'm reading that right. Any thoughts on why that's the case? Are there mitigations possible? merlin
Re: weird interaction between asynchronous queries and pg_sleep
On Thu, Apr 8, 2021 at 1:05 PM Merlin Moncure wrote: > This effect is only noticeable when the remote query is returning > volumes of data. My question is, is there any way to sleep loop > client side without giving up 3x performance penalty? Why is that > that when more local sleep queries are executed, performance improves? Looking at this more, it looks like that when sleeping with pg_sleep, libpq does not receive the data. I think for this type of pattern to work correctly, dblink would need a custom sleep function wrapping poll (or epoll) that consumes input on the socket when signalled read ready. merlin
weird interaction between asynchronous queries and pg_sleep
Consider the following snippet create table data as select generate_series(1,100) s; do $d$ begin PERFORM * FROM dblink_connect('test',''); PERFORM * from dblink_send_query('test', 'SELECT * FROM data'); LOOP if dblink_is_busy('test') = 0 THEN PERFORM * FROM dblink_get_result('test') AS R(V int); PERFORM * FROM dblink_get_result('test') AS R(V int); RETURN; END IF; PERFORM pg_sleep(.001); END LOOP; PERFORM * FROM dblink_disconnect('test'); END; $d$; What's interesting here is that, when I vary the sleep parameter, I get: 0: .4 seconds (per top, this is busywait), same as running synchronous. 0.01: 1.4 seconds 0.001: 2.4 seconds 0.01: 10.6 seconds 0.1: does not terminate This effect is only noticeable when the remote query is returning volumes of data. My question is, is there any way to sleep loop client side without giving up 3x performance penalty? Why is that that when more local sleep queries are executed, performance improves? merlin
Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
On Tue, Mar 30, 2021 at 7:14 PM Tom Lane wrote: > > Justin Pryzby writes: > > On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote: > >> We just upgraded from postgres 11 to 12.6 and our server is running > >> out of memory and rebooting about 1-2 times a day. > > > I haven't tried your test, but this sounds a lot like the issue I reported > > with > > JIT, which is enabled by default in v12. > > FWIW, I just finished failing to reproduce any problem with that > test case ... but I was using a non-JIT-enabled build. Yep. Disabling jit (fully, fia jit=off, not what was suggested upthread) eliminated the issue, or at least highly mitigated the leak. I was using pgdg rpm packaging, which enables jit by default. Thanks everyone for looking at this, and the workaround is quick and easy. merlin
unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
Hello all, We just upgraded from postgres 11 to 12.6 and our server is running out of memory and rebooting about 1-2 times a day.Application architecture is a single threaded stored procedure, executed with CALL that loops and never terminates. With postgres 11 we had no memory issues. Ultimately the crash looks like this: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc 2021-03-29 04:34:31.262 CDT [1413] LOG: server process (PID 9792) was terminated by signal 6: Aborted 2021-03-29 04:34:31.262 CDT [1413] DETAIL: Failed process was running: CALL Main() 2021-03-29 04:34:31.262 CDT [1413] LOG: terminating any other active server processes 2021-03-29 04:34:31.264 CDT [9741] WARNING: terminating connection because of crash of another server process 2021-03-29 04:34:31.264 CDT [9741] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2021-03-29 04:34:31.264 CDT [9741] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2021-03-29 04:34:31.267 CDT [1413] LOG: archiver process (PID 9742) exited with exit code 1 2021-03-29 04:34:31.267 CDT [1413] LOG: all server processes terminated; reinitializing Attached is a self contained test case which reproduces the problem. Instructions: 1. run the attached script in psql, pgtask_test.sql. It will create a database, initialize it, and run the main procedure. dblink must be available 2. in another window, run SELECT CreateTaskChain('test', 'DEV'); In the console that ran main(), you should see output that the procedure began to do work. Once it does, a 'top' should show resident memory growth immediately. It's about a gigabyte an hour in my test. Sorry for the large-ish attachment. merlin pgtask_test.sql Description: Binary data pgtask.sql Description: Binary data
Re: plpgsql variable assignment with union is broken
On Thu, Jan 7, 2021 at 10:55 AM Pavel Stehule wrote: >> Again, if this is narrowly confined to assignment into set query >> operations, maybe this is not so bad. But is it? > > PLpgSQL_Expr: opt_target_list > <--><--><-->from_clause where_clause > <--><--><-->group_clause having_clause window_clause > <--><--><-->opt_sort_clause opt_select_limit opt_for_locking_clause > <--><--><--><-->{ > > So SELECT INTO and assignment are not fully compatible now. OK. Well, I went ahead and checked the code base for any suspicious assignments that might fall out of the tightened syntax. It was cursory, but didn't turn up anything obvious. So I'm going to change my position on this. My feedback would be to take backwards compatibility very seriously relating to language changes in the future, and to rely less on documentation arguments as it relates to change justification. The current behavior has been in place for decades and is therefore a de facto standard. Change freedom ought to be asserted in scenarios where behavior is reserved as undefined or is non standard rather than assumed. Rereading the development thread, it seems a fairly short timeframe between idea origination and commit, and hypothetical impacts to existing code bases was not rigorously assessed. I guess it's possible this may ultimately cause some heartburn but it's hard to say without strong data to justify that position. Having said all of that, I'm very excited about the array assignment improvements and investment in this space is very much appreciated. . merlin
Re: plpgsql variable assignment with union is broken
On Wed, Jan 6, 2021 at 11:07 PM Pavel Stehule wrote: > čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure napsal: >> >> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: >> > >> > easter...@verfriemelt.org writes: >> > > i found, that the behaviour of variable assignment in combination with >> > > union is not working anymore: >> > > DO $$ >> > > DECLARE t bool; >> > > begin >> > > t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS >> > > a; >> > > END $$; >> > >> > > is this an intended change or is it a bug? >> > >> > It's an intended change, or at least I considered the case and thought >> > that it was useless because assignment will reject any result with more >> > than one row. Do you have any non-toy example that wouldn't be as >> > clear or clearer without using UNION? The above sure seems like an >> > example of awful SQL code. >> >> What is the definition of broken here? What is the behavior of the >> query with the change and why? >> >> OP's query provably returns a single row and ought to always assign >> true as written. A real world example might evaluate multiple >> condition branches so that the assignment resolves true if any branch >> is true. It could be rewritten with 'OR' of course. >> >> Is this also "broken"? >> t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT >> 'something else' AS a WHERE NOT _Flag; >> >> What about this? >> SELECT INTO t true WHERE false >> UNION select true; > > > ANSI SQL allows only SELECT INTO or var := SQL expression and SQL expression > can be (subquery) too This is PLPGSQL not ansi SQL so that's irrelevant. If queries along the lines of: var := FROM (SELECT ..) UNION .. are narrowly broken, ok, but is that all that's going on here? I guess I ought to test. I have a 300k line pl/pgsql project, this thread is terrifying me. I am going to be blunt here and say I am not comfortable with tightening pl/pgsql syntax without an impact assessment, The point that this is undocumanted behavior is weak, and it's already turning up problem reports. IMO, expectation has been clearly set that var := expression; is more or less interchangeable with SELECT INTO var expression; Again, if this is narrowly confined to assignment into set query operations, maybe this is not so bad. But is it? merlin
Re: plpgsql variable assignment with union is broken
On Wed, Jan 6, 2021 at 9:39 PM Tom Lane wrote: > > Merlin Moncure writes: > > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: > >> easter...@verfriemelt.org writes: > >>> i found, that the behaviour of variable assignment in combination with > >>> union is not working anymore: > >>> DO $$ > >>> DECLARE t bool; > >>> begin > >>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; > >>> END $$; > >>> is this an intended change or is it a bug? > > >> It's an intended change, or at least I considered the case and thought > >> that it was useless because assignment will reject any result with more > >> than one row. Do you have any non-toy example that wouldn't be as > >> clear or clearer without using UNION? The above sure seems like an > >> example of awful SQL code. > > > What is the definition of broken here? What is the behavior of the > > query with the change and why? > > The OP is complaining that that gets a syntax error since c9d529848. > > > OP's query provably returns a single row and ought to always assign > > true as written. > > My opinion is that (a) it's useless and (b) there has never been any > documentation that claimed that you could do this. Here is what the documentation says: > variable { := | = } expression; > As explained previously, the expression in such a statement is evaluated by > means of an SQL SELECT command sent to the main database engine. This is valid SQL: SELECT a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; So I'd argue that OP's query *is* syntactically valid per the rules as I understand them. and is my opinion entirely consistent with the documentation in that it a) resolves exactly one row, and: b) is made syntactically valid by prefixing the expression with SELECT. Aesthetical considerations are irrelevant IMO. merlin
Re: plpgsql variable assignment with union is broken
On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: > > easter...@verfriemelt.org writes: > > i found, that the behaviour of variable assignment in combination with > > union is not working anymore: > > DO $$ > > DECLARE t bool; > > begin > > t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; > > END $$; > > > is this an intended change or is it a bug? > > It's an intended change, or at least I considered the case and thought > that it was useless because assignment will reject any result with more > than one row. Do you have any non-toy example that wouldn't be as > clear or clearer without using UNION? The above sure seems like an > example of awful SQL code. What is the definition of broken here? What is the behavior of the query with the change and why? OP's query provably returns a single row and ought to always assign true as written. A real world example might evaluate multiple condition branches so that the assignment resolves true if any branch is true. It could be rewritten with 'OR' of course. Is this also "broken"? t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT 'something else' AS a WHERE NOT _Flag; What about this? SELECT INTO t true WHERE false UNION select true; merlin
Re: Zedstore - compressed in-core columnar storage
On Mon, Nov 16, 2020 at 10:07 AM Tomas Vondra wrote: > > > On 11/16/20 1:59 PM, Merlin Moncure wrote: > > On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra > > wrote: > >>masterzedstore/pglzzedstore/lz4 > >> - > >>copy 1855680922131 > >>dump 751 905 811 > >> > >> And the size of the lineitem table (as shown by \d+) is: > >> > >> master: 64GB > >> zedstore/pglz: 51GB > >> zedstore/lz4: 20GB > >> > >> It's mostly expected lz4 beats pglz in performance and compression > >> ratio, but this seems a bit too extreme I guess. Per past benchmarks > >> (e.g. [1] and [2]) the difference in compression/decompression time > >> should be maybe 1-2x or something like that, not 35x like here. > > > > I can't speak to the ratio, but in basic backup/restore scenarios pglz > > is absolutely killing me; Performance is just awful; we are cpubound > > in backups throughout the department. Installations defaulting to > > plgz will make this feature show very poorly. > > > > Maybe. I'm not disputing that pglz is considerably slower than lz4, but > judging by previous benchmarks I'd expect the compression to be slower > maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly > for the compression ratio - lz4 is great, but it seems strange it's 1/2 > the size of pglz. Which is why I'm speculating that something else is > going on. > > As for the "plgz will make this feature show very poorly" I think that > depends. I think we may end up with pglz doing pretty well (compared to > heap), but lz4 will probably outperform that. OTOH for various use cases > it may be more efficient to use something else with worse compression > ratio, but allowing execution on compressed data, etc. hm, you might be right. Doing some number crunching, I'm getting about 23mb/sec compression on a 600gb backup image on a pretty typical aws server. That's obviously not great, but your numbers are much worse than that, so maybe something else might be going on. > I think we may end up with pglz doing pretty well (compared to heap) I *don't* think so, or at least I'm skeptical as long as insertion times are part of the overall performance measurement. Naturally, with column stores, insertion times are often very peripheral to the overall performance picture but for cases that aren't I suspect the results are not going to be pleasant, and advise planning accordingly. Aside, I am very interested in this work. I may be able to support testing in an enterprise environment; lmk if interested -- thank you merlin
Re: Zedstore - compressed in-core columnar storage
On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra wrote: >masterzedstore/pglzzedstore/lz4 > - >copy 1855680922131 >dump 751 905 811 > > And the size of the lineitem table (as shown by \d+) is: > > master: 64GB > zedstore/pglz: 51GB > zedstore/lz4: 20GB > > It's mostly expected lz4 beats pglz in performance and compression > ratio, but this seems a bit too extreme I guess. Per past benchmarks > (e.g. [1] and [2]) the difference in compression/decompression time > should be maybe 1-2x or something like that, not 35x like here. I can't speak to the ratio, but in basic backup/restore scenarios pglz is absolutely killing me; Performance is just awful; we are cpubound in backups throughout the department. Installations defaulting to plgz will make this feature show very poorly. merlin
Re: Optimize memory allocation code
On Fri, Sep 25, 2020 at 7:32 PM Li Japin wrote: > > > > > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > > > Hi, > > > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: > >> > >> Hi, hackers! > >> > >> I find the palloc0() is similar to the palloc(), we can use palloc() > >> inside palloc0() > >> to allocate space, thereby I think we can reduce duplication of code. > > > > The code is duplicated on purpose. There's a comment at the beginning > > that mentions it: > > > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > > > Same for MemoryContextAllocZero() itself. > > Thanks! How big is this overhead? Is there any way I can test it? Profiler. For example, oprofile. In hot areas of the code (memory allocation is very hot), profiling is the first step. merlin
Re: dblnk_is_busy returns 1 for dead connecitons
On Sun, Aug 2, 2020 at 9:55 PM Merlin Moncure wrote: > > On Sun, Aug 2, 2020 at 7:18 PM Merlin Moncure wrote: > > > > Hackers, > > > > I have a situation that I am observing where dblink_is_busy returns 1 > > even though the connection is long gone. tcp keepalives are on and > > the connection has been dead for several hours. Looking at the call > > for dblink_is_busy, I see that it is a thin wrapper to PQusBusy(). > > If I attempt to call dblink_get_result(), the result comes back with > > an error mesage, 'invalid socket'. This however is not helpful since > > there is no way to probe for dead connections in dblink that appears > > to be 100% reliable. My workaround that I had been relying on was to > > call dblink_get_notify twice, which for some weird reason forced the > > connection error to the surface. However for whatever reason, that is > > not working here. > > > > In cases the connection was cancelled via dblink_cancel_query(), so in > > some scenarios connections cancelled that way seem to become 'stuck'. > > Any thoughts on this? > > Correction, keepalives are probably not on, because dblink does not > have an option to set them. Also, it looks like dblink_is_busy() > calls pqConsumeInput without checking the error code. Is that safe? I could not reproduce this with application external test script (see attached if curious). I alos noticed you can set keepalives in the libpq connection string, so I'll do that and see if it helps, and report back for posterity. Thanks, sorry for the noise. merlin dblink_test.sql Description: Binary data
Re: dblnk_is_busy returns 1 for dead connecitons
On Sun, Aug 2, 2020 at 7:18 PM Merlin Moncure wrote: > > Hackers, > > I have a situation that I am observing where dblink_is_busy returns 1 > even though the connection is long gone. tcp keepalives are on and > the connection has been dead for several hours. Looking at the call > for dblink_is_busy, I see that it is a thin wrapper to PQusBusy(). > If I attempt to call dblink_get_result(), the result comes back with > an error mesage, 'invalid socket'. This however is not helpful since > there is no way to probe for dead connections in dblink that appears > to be 100% reliable. My workaround that I had been relying on was to > call dblink_get_notify twice, which for some weird reason forced the > connection error to the surface. However for whatever reason, that is > not working here. > > In cases the connection was cancelled via dblink_cancel_query(), so in > some scenarios connections cancelled that way seem to become 'stuck'. > Any thoughts on this? Correction, keepalives are probably not on, because dblink does not have an option to set them. Also, it looks like dblink_is_busy() calls pqConsumeInput without checking the error code. Is that safe? merlin
dblnk_is_busy returns 1 for dead connecitons
Hackers, I have a situation that I am observing where dblink_is_busy returns 1 even though the connection is long gone. tcp keepalives are on and the connection has been dead for several hours. Looking at the call for dblink_is_busy, I see that it is a thin wrapper to PQusBusy(). If I attempt to call dblink_get_result(), the result comes back with an error mesage, 'invalid socket'. This however is not helpful since there is no way to probe for dead connections in dblink that appears to be 100% reliable. My workaround that I had been relying on was to call dblink_get_notify twice, which for some weird reason forced the connection error to the surface. However for whatever reason, that is not working here. In cases the connection was cancelled via dblink_cancel_query(), so in some scenarios connections cancelled that way seem to become 'stuck'. Any thoughts on this? merlin
Re: database stuck in __epoll_wait_nocancel(). Are infinite timeouts safe?
On Fri, Mar 13, 2020 at 2:28 PM Andres Freund wrote: > > Hi, > > On March 13, 2020 12:08:32 PM PDT, Merlin Moncure wrote: > >I have 5 servers in a testing environment that are comprise a data > >warehousing cluster. They will typically get each get exactly the > >same query at approximately the same time. Yesterday, around 1pm, 3 > >of the five got stuck on the same query. Each of them yields similar > >stack traces. This happens now and then. The server is 9.6.12 > >(which is obviously old, but I did not see any changes in relevant > >code). > > > >(gdb) bt > >#0 0x7fe856c0b463 in __epoll_wait_nocancel () from > >/lib64/libc.so.6 > >#1 0x006b4416 in WaitEventSetWaitBlock (nevents=1, > >occurred_events=0x7ffc9f2b0f60, cur_timeout=-1, set=0x27cace8) at > >latch.c:1053 > >#2 WaitEventSetWait (set=0x27cace8, timeout=timeout@entry=-1, > >occurred_events=occurred_events@entry=0x7ffc9f2b0f60, > >nevents=nevents@entry=1) at latch.c:1007 > >#3 0x005f26dd in secure_write (port=0x27f16a0, > >ptr=ptr@entry=0x27f5528, len=len@entry=192) at be-secure.c:255 > >#4 0x005fb51b in internal_flush () at pqcomm.c:1410 > >#5 0x005fb72a in internal_putbytes (s=0x2a4f245 "14M04", > >s@entry=0x2a4f228 "", len=70) at pqcomm.c:1356 > >#6 0x005fb7f0 in socket_putmessage (msgtype=68 'D', > >s=0x2a4f228 "", len=) at pqcomm.c:1553 > >#7 0x005fd5d9 in pq_endmessage (buf=buf@entry=0x7ffc9f2b1040) > >at pqformat.c:347 > >#8 0x00479a63 in printtup (slot=0x2958fc8, self=0x2b6bca0) at > >printtup.c:372 > >#9 0x005c1cc9 in ExecutePlan (dest=0x2b6bca0, > >direction=, numberTuples=0, sendTuples=1 '\001', > >operation=CMD_SELECT, > >use_parallel_mode=, planstate=0x2958cf8, > >estate=0x2958be8) at execMain.c:1606 > >#10 standard_ExecutorRun (queryDesc=0x2834998, direction= >out>, count=0) at execMain.c:339 > >#11 0x006d69a7 in PortalRunSelect > >(portal=portal@entry=0x2894e38, forward=forward@entry=1 '\001', > >count=0, count@entry=9223372036854775807, > >dest=dest@entry=0x2b6bca0) at pquery.c:948 > >#12 0x006d7dbb in PortalRun (portal=0x2894e38, > >count=9223372036854775807, isTopLevel=, dest=0x2b6bca0, > >altdest=0x2b6bca0, > >completionTag=0x7ffc9f2b14e0 "") at pquery.c:789 > >#13 0x006d5a06 in PostgresMain (argc=, > >argv=, dbname=, username= >out>) at postgres.c:1109 > >#14 0x0046fc28 in BackendRun (port=0x27f16a0) at > >postmaster.c:4342 > >#15 BackendStartup (port=0x27f16a0) at postmaster.c:4016 > >#16 ServerLoop () at postmaster.c:1721 > >#17 0x00678119 in PostmasterMain (argc=argc@entry=3, > >argv=argv@entry=0x27c8c90) at postmaster.c:1329 > >#18 0x0047088e in main (argc=3, argv=0x27c8c90) at main.c:228 > >(gdb) quit > > > >Now, the fact that this happened to multiple servers at time strongly > >suggest an external (to the database) problem. The system initiating > >the query, a cross database query over dblink, has been has given up > >(and has been restarted as a precaution) a long time ago, and the > >connection is dead. secure_write() sets however an infinite timeout > >to the latch, and there are clearly scenarios where epoll waits > >forever for an event that is never going to occur. If/when this > >happens, the only recourse is to restart the impacted database. The > >question is, shouldn't the latch have a looping timeout that checks > >for interrupts? What would the risks be of jumping directly out of > >the latch loop? > > Unless there is a kernel problem latches are interruptible by signals, as the > signal handler should do a SetLatch(). > > This backtrace just looks like the backend is trying to send data to the > client? What makes you think it's stuck? Well, the client has been gone for > 24 hours. But your right, when I send cancel to the backend, here is what happens according to strace: epoll_wait(3, 0x2915e08, 1, -1) = -1 EINTR (Interrupted system call) --- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=5024, si_uid=26} --- write(13, "\0", 1) = 1 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) sendto(11, "\0\0\0\00545780\0\0\0\003508D\0\0\0d\0\t\0\0\0\00615202"..., 5640, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(3, [{EPOLLIN, {u32=43081176, u64=43081176}}], 1, -1) = 1 read(12, "\0", 16) = 1 epoll_wait(3, ...pg_terminate_backend() however, does properly kill the query. > If the connection is dead, epoll should return (both because we ask for the > relevant events, and because it just always implicitly does do so). > > So it seems likely that either your connection isn't actually dead (e.g. > waiting for tcp timeouts), or you have a kennel bug. maybe, I suspect firewall issue. hard to say merlin
database stuck in __epoll_wait_nocancel(). Are infinite timeouts safe?
I have 5 servers in a testing environment that are comprise a data warehousing cluster. They will typically get each get exactly the same query at approximately the same time. Yesterday, around 1pm, 3 of the five got stuck on the same query. Each of them yields similar stack traces. This happens now and then. The server is 9.6.12 (which is obviously old, but I did not see any changes in relevant code). (gdb) bt #0 0x7fe856c0b463 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x006b4416 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffc9f2b0f60, cur_timeout=-1, set=0x27cace8) at latch.c:1053 #2 WaitEventSetWait (set=0x27cace8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffc9f2b0f60, nevents=nevents@entry=1) at latch.c:1007 #3 0x005f26dd in secure_write (port=0x27f16a0, ptr=ptr@entry=0x27f5528, len=len@entry=192) at be-secure.c:255 #4 0x005fb51b in internal_flush () at pqcomm.c:1410 #5 0x005fb72a in internal_putbytes (s=0x2a4f245 "14M04", s@entry=0x2a4f228 "", len=70) at pqcomm.c:1356 #6 0x005fb7f0 in socket_putmessage (msgtype=68 'D', s=0x2a4f228 "", len=) at pqcomm.c:1553 #7 0x005fd5d9 in pq_endmessage (buf=buf@entry=0x7ffc9f2b1040) at pqformat.c:347 #8 0x00479a63 in printtup (slot=0x2958fc8, self=0x2b6bca0) at printtup.c:372 #9 0x005c1cc9 in ExecutePlan (dest=0x2b6bca0, direction=, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=, planstate=0x2958cf8, estate=0x2958be8) at execMain.c:1606 #10 standard_ExecutorRun (queryDesc=0x2834998, direction=, count=0) at execMain.c:339 #11 0x006d69a7 in PortalRunSelect (portal=portal@entry=0x2894e38, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x2b6bca0) at pquery.c:948 #12 0x006d7dbb in PortalRun (portal=0x2894e38, count=9223372036854775807, isTopLevel=, dest=0x2b6bca0, altdest=0x2b6bca0, completionTag=0x7ffc9f2b14e0 "") at pquery.c:789 #13 0x006d5a06 in PostgresMain (argc=, argv=, dbname=, username=) at postgres.c:1109 #14 0x0046fc28 in BackendRun (port=0x27f16a0) at postmaster.c:4342 #15 BackendStartup (port=0x27f16a0) at postmaster.c:4016 #16 ServerLoop () at postmaster.c:1721 #17 0x00678119 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x27c8c90) at postmaster.c:1329 #18 0x0047088e in main (argc=3, argv=0x27c8c90) at main.c:228 (gdb) quit Now, the fact that this happened to multiple servers at time strongly suggest an external (to the database) problem. The system initiating the query, a cross database query over dblink, has been has given up (and has been restarted as a precaution) a long time ago, and the connection is dead. secure_write() sets however an infinite timeout to the latch, and there are clearly scenarios where epoll waits forever for an event that is never going to occur. If/when this happens, the only recourse is to restart the impacted database. The question is, shouldn't the latch have a looping timeout that checks for interrupts? What would the risks be of jumping directly out of the latch loop? merlin
Re: Error on failed COMMIT
On Mon, Feb 24, 2020 at 4:06 PM Vladimir Sitnikov wrote: > > Merlin>My biggest sense of alarm with the proposed change is that it could > Merlin>leave applications in a state where the transaction is hanging there > > How come? > The spec says commit ends the transaction. > Can you please clarify where the proposed change leaves a hanging transaction? > > Just in case, the proposed change is as follows: > > postgres=# begin; > BEGIN > postgres=# aslkdfasdf; > ERROR: syntax error at or near "aslkdfasdf" > LINE 1: aslkdfasdf; > ^ > postgres=# commit; > ROLLBACK <-- this should be replaced with "ERROR: can't commit the > transaction because ..." > postgres=# commit; > WARNING: there is no transaction in progress <-- this should be as it is > currently. Even if commit throws an error, the transaction should be > terminated. > COMMIT Ok, you're right; I missed the point in that it's not nearly as bad as I thought you were suggesting (to treat commit as bad statement) but the transaction would still terminate. Still, this is very sensitive stuff, do you think most common connection poolers would continue to work after making this change? merlin
Re: Error on failed COMMIT
On Sun, Feb 23, 2020 at 7:59 PM Dave Cramer wrote: > > I think the fact that this is a violation of the SQL SPEC lends considerable > credence to the argument for changing the behaviour. > Since this can lead to losing a transaction I think there is even more reason > to look at changing the behaviour. The assumption that COMMIT terminates the transaction is going to be deeply embedded into many applications. It's just too convenient not to rely on. For example, I maintain a bash based deployment framework that assembles large SQL files from bit and pieces and tacks a COMMIT at the end. It's not *that* much work to test for failure and add a rollback but it's the kind of surprise our users hate during the upgrade process. Over the years we've tightened the behavior of postgres to be inline with the spec (example: Tom cleaned up the row-wise comparison behavior in 8.2) but in other cases we had to punt (IS NULL/coalesce disagreement over composites for example), identifier case sensitivity etc. The point is, changing this stuff can be really painful and we have to evaluate the benefits vs the risks. My biggest sense of alarm with the proposed change is that it could leave applications in a state where the transaction is hanging there it could previously assume it had resolved; this could be catastrophic in impact in certain real world scenarios. Tom is right, a GUC is the equivalent of "sweeping the problem under the wrong" (if you want examples of the long term consequences of that vision read through this: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initialization.html). The value proposition of the change is however a little light relative to the risks IMO. I do think we need to have good page summarizing non-spec behaviors in the documentation however. merlin
Re: DB running out of memory issues after upgrade
On Tue, Feb 18, 2020 at 12:10 PM Nagaraj Raj wrote: > > Below are the same configurations ins .conf file before and after updagrade > > show max_connections; = 1743 > show shared_buffers = "4057840kB" > show effective_cache_size = "8115688kB" > show maintenance_work_mem = "259MB" > show checkpoint_completion_target = "0.9" > show wal_buffers = "16MB" > show default_statistics_target = "100" > show random_page_cost = "1.1" > show effective_io_concurrency =" 200" > show work_mem = "4MB" > show min_wal_size = "256MB" > show max_wal_size = "2GB" > show max_worker_processes = "8" > show max_parallel_workers_per_gather = "2" This smells like oom killer for sure. how did you resolve some of these values. In particular max_connections and effective_cache_size. How much memory is in this server? merlin
Re: [HACKERS] emergency outage requiring database restart
On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut wrote: > > On 11/7/16 5:31 PM, Merlin Moncure wrote: > > Regardless, it seems like you might be on to something, and I'm > > inclined to patch your change, test it, and roll it out to production. > > If it helps or at least narrows the problem down, we ought to give it > > consideration for inclusion (unless someone else can think of a good > > reason not to do that, heh!). > > Any results yet? Not yet. But I do have some interesting findings. At this point I do not think the problem is within pl/sh itself, but that when a process is invoked from pl/sh misbehaves that misbehavior can penetrate into the database processes. I also believe that this problem is fd related, so that the 'close on exec' might reasonably fix it. All cases of database damage I have observed remain completely mitigated by enabling database checksums. Recently, a sqsh process kicked off via pl/sh crashed with signal 11 but the database process was otherwise intact and fine. This is strong supporting evidence to my points above, I think. I've also turned up a fairly reliable reproduction case from some unrelated application changes. If I can demonstrate that close on exec flag works and prevents these occurrences we can close the book on this. merlin
Re: Greatest Common Divisor
On Mon, Jan 6, 2020 at 6:52 AM Fabien COELHO wrote: > > > Hello Robert, > > >> if (arg1 == PG_INT32_MIN) > >> if (arg2 == 0 || arg2 == PG_INT32_MIN) > >> > >> And possibly a "likely" on the while. > > > > I don't think decoration the code with likely() and unlikely() all > > over the place is a very good idea. > > > Odds are good that we'll end up with a bunch that are actually > > non-optimal, and nobody will ever figure it out because it's hard to > > figure out. > > My 0.02€: I'd tend to disagree. > > Modern pipelined processors can take advantage of speculative execution on > branches, so if you know which branch is the more likely it can help. > > Obviously if you get it wrong it does not, but for the above cases it > seems to me that they are rather straightforward. > > It also provides some "this case is expected to be exceptional" semantics > to people reading the code. > > > I have a hard time believing that we're going to be much > > worse off if we just write the code normally. > > I think that your point applies to more general programming in postgres, > but this is not the context here. > > For low-level arithmetic code like this one, with tests and loops > containing very few hardware instructions, I think that helping compiler > optimizations is a good idea. Do you have any performance data to back that up? merlin
Re: Greatest Common Divisor
On Fri, Jan 3, 2020 at 1:32 PM Robert Haas wrote: > > On Fri, Jan 3, 2020 at 2:27 PM Chapman Flack wrote: > > On 1/3/20 2:11 PM, Robert Haas wrote: > > > and moving things to another schema does not help with that. It does > > > potentially help with the namespace pollution issue, but how much of > > > an issue is that anyway? Unless you've set up an unusual search_path > > > configuration, your own schemas probably precede pg_catalog in your > > > search path, besides which it seems unlikely that many people have a > > > gcd() function that does anything other than take the greatest common > > > divisor. > > > > As seen in this thread though, there can be edge cases of "take the > > greatest common divisor" that might not be identically treated in a > > thoroughly-reviewed addition to core as in someone's hastily-rolled > > local version. > > True, but because of the way search_path is typically set, they'd > probably continue to get their own version anyway, so I'm not sure > what the problem is. Is that right? Default search_path is for pg_catalog to resolve before public. Lightly testing with a hand rolled pg_advisory_lock implementation that raise a notice, my default database seemed to prefer the build in function. Maybe I'm not following you. > There are counter-arguments to that, though. Maintaining a lot of > extensions with only one or two functions in them is a nuisance. > Having things installed by default is convenient for wanting to use > them. Maintaining contrib code so that it works whether or not the SQL > definitions have been updated via ALTER EXTENSION .. UPDATE takes some > work and thought, and sometimes we screw it up. If the external contract changes (which seems likely for gcd) than I would much rather have the core team worry about this than force your users to worry about it, which is what putting the function in core would require them to do (if version < x call it this way, > y then that way etc). This is exactly why we shouldn't be putting non standard items in core (maybe excepting some pg_ prefixed administration functions). Now, it's quite unfair to $OP to saddle his proposal and patch with the broader considerations of core/extension packaging, so if some kind of rational framework can be applied to the NEXT submission, or a least a discussion about this can start, those are all good options. But we need to start from somewhere, and moving forward with, "If it's not sql standard or prefixed with pg_, it ought not to be in pg_catalog" might be a good way to open the discussion. merlin
Re: Greatest Common Divisor
On Fri, Jan 3, 2020 at 10:24 AM Robert Haas wrote: > > On Fri, Jan 3, 2020 at 10:23 AM Tom Lane wrote: > > Now, those functions were just exposing libc functionality, so there > > wasn't a lot of code to write. There might be a good argument that > > gcd isn't useful enough to justify the amount of code we'd have to > > add (especially if we allow it to scope-creep into needing to deal > > with "numeric" calculations). But I'm not on board with just > > dismissing it as uninteresting. > > Yeah. There's always the question with things like this as to whether > we ought to push certain things into contrib modules that are not > installed by default to avoid bloating the set of things built into > the core server. But it's hard to know where to draw the line. Just stop doing it. It's very little extra work to package an item into an extension and this protects your hapless users who might have implemented a function called gcd() that does something different. Ideally, the public namespace should contain (by default) only sql standard functions with all non-standard material in an appropriate extension. Already released material is obviously problematic and needs more thought but we ought to at least stop making the problem worse if possible. merlin
Re: Greatest Common Divisor
On Sat, Dec 28, 2019 at 12:15 PM Fabien COELHO wrote: > > > Bonsoir Vik, > > > I recently came across the need for a gcd function (actually I needed > > lcm) and was surprised that we didn't have one. > > Why not. Proliferation of code in the public namespace; it can displace code that is written by others during the upgrade. merlin
Re: inherits clause for CREATE TYPE? -
On Wed, Dec 18, 2019 at 12:38 PM Pavel Stehule wrote: > > Hi > > I had a talk with one boy about development in plpgsql. He uses table's > functions. More times he uses returns types based on some table type + few > attributes. Now he use a ugly hack - he create a view on table plus some > columns - and then he use the view related type as table function result > type. For similar uses cases there can be interesting to have a possibility > to create types by extending other types. Probably almost all functionality > is inside now - so it should not be hard work. > > My idea is implement inherits clause for CREATE TYPE command. > > Some like > > CREATE TYPE fx_rt (xx int) INHERITS(pg_class); > > What do you think about this idea? How about using composition style approaches? create type base as (...) create type extended as (b base, ...) create function foo() returns extended as ... merlin
Re: Why JIT speed improvement is so modest?
On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik wrote: > JIT was not able to significantly (times) increase speed on Q1 query? > Experiment with VOPS shows that used aggregation algorithm itself is not > a bottleneck. > Profile also give no answer for this question. > Any ideas? Well, in the VOPS variant around 2/3 of the time is spent in routines that are obviously aggregation. In the JIT version, it's around 20%. So this suggests that the replacement execution engine is more invasive. I would also guess (!) that the VOPS engine optimizes fewer classes of query plan. ExecScan for example, looks to be completely optimized out VOPS but is still utilized in the JIT engine. I experimented with Vitessa a couple of years back and this was consistent with my recollection. merlin
Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)
On Tue, Aug 27, 2019 at 5:52 PM Merlin Moncure wrote: > > On Mon, Aug 26, 2019 at 12:01 PM Tom Lane wrote: > > > > Justin Pryzby writes: > > > On Mon, Aug 26, 2019 at 12:45:24PM -0400, Tom Lane wrote: > > >> However ... there is some pretty interesting info at > > >> https://bugzilla.redhat.com/show_bug.cgi?id=1338673 > > >> suggesting that compiling with a late-model gcc against older RHEL6 > > >> headers could result in bad code. I wonder whether the reporters' > > >> servers were built using such a configuration. (Although the linkage, > > >> if any, to this report still wouldn't be very clear.) > > > > > I can tell it was compiled using > > > version | PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) > > > 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit > > > > Ah, that appears to be the default compiler for RHEL6, so that theory > > is out the window. It's still interesting that we're only seeing this > > reported from RHEL6 ... maybe there's something specific to the code > > that this gcc version generates? > > FWIW, I've got the same compiler (which is natural, we are likely > using the same packaging): > PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 > (Red Hat 4.4.7-23), 64-bit I'm looking for more common threads here. One interesting fact about this server is that we just realized (!) that cron was attempting to email huge log files and failing due to misconfigured email. The mail queue was building into the gigabytes so that on half hour cadence the server was running out of memory until the mail process crashed. Postgres ran just fine through this process (which is pretty cool). So, question: were there any memory issues on the other server on or around the crash? merlin
no mailing list hits in google
Hackers, [apologies if this is the incorrect list or is already discussed material] I've noticed that mailing list discussions in -hackers and other mailing lists appear to not be indexed by google -- at all. We are also not being tracked by any mailing list aggregators -- in contrast to a decade ago where we had nabble and other systems to collect and organize results (tbh, often better than we do) we are now at an extreme disadvantage; mailing list activity was formerly and absolutely fantastic research via google to find solutions to obscure technical problems in the database. Limited access to this information will directly lead to increased bug reports, lack of solution confidence, etc. My test case here is the query: pgsql-hackers ExecHashJoinNewBatch I was searching out a link to recent bug report for copy/paste into corporate email. In the old days this would fire right up but now returns no hits even though the discussion is available in the archives (which I had to find by looking up the specific day the thread was active). Just a heads up. merlin
Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)
On Mon, Aug 26, 2019 at 12:01 PM Tom Lane wrote: > > Justin Pryzby writes: > > On Mon, Aug 26, 2019 at 12:45:24PM -0400, Tom Lane wrote: > >> However ... there is some pretty interesting info at > >> https://bugzilla.redhat.com/show_bug.cgi?id=1338673 > >> suggesting that compiling with a late-model gcc against older RHEL6 > >> headers could result in bad code. I wonder whether the reporters' > >> servers were built using such a configuration. (Although the linkage, > >> if any, to this report still wouldn't be very clear.) > > > I can tell it was compiled using > > version | PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) > > 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit > > Ah, that appears to be the default compiler for RHEL6, so that theory > is out the window. It's still interesting that we're only seeing this > reported from RHEL6 ... maybe there's something specific to the code > that this gcc version generates? FWIW, I've got the same compiler (which is natural, we are likely using the same packaging): PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit merlin
Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)
On Sun, Aug 25, 2019 at 9:35 PM Thomas Munro wrote: > > On Mon, Aug 26, 2019 at 1:44 PM Justin Pryzby wrote: > > On Mon, Aug 26, 2019 at 01:09:19PM +1200, Thomas Munro wrote: > > > On Sun, Aug 25, 2019 at 3:15 PM Peter Geoghegan wrote: > > > > I was reminded of this issue from last year, which also appeared to > > > > involve BufFileClose() and a double-free: > > > > > > > > https://postgr.es/m/87y3hmee19@news-spur.riddles.org.uk > > > > > > > > That was a BufFile that was under the control of a tuplestore, so it > > > > was similar to but different from your case. I suspect it's related. > > > > > > Hmm. tuplestore.c follows the same coding pattern as nodeHashjoin.c: > > > it always nukes its pointer after calling BufFileFlush(), so it > > > shouldn't be capable of calling it twice for the same pointer, unless > > > we have two copies of that pointer somehow. > > > > > > Merlin's reported a double-free apparently in ExecHashJoin(), not > > > ExecHashJoinNewBatch() like this report. Unfortunately that tells us > > > very little. > > Here's another one: > > https://www.postgresql.org/message-id/flat/20170601081104.1500.56202%40wrigleys.postgresql.org > > Hmm. Also on RHEL/CentOS 6, and also involving sorting, hashing, > BufFileClose() but this time the glibc double free error is in > repalloc(). > > And another one (repeatedly happening): > > https://www.postgresql.org/message-id/flat/3976998C-8D3B-4825-9B10-69ECB70A597A%40appnexus.com > > Also on RHEL/CentOS 6, this time a sort in once case and a hash join > in another case. > > Of course it's entirely possible that we have a bug here and I'm very > keen to find it, but I can't help noticing the common factor here is > that they're all running ancient RHEL 6.x releases, except Merlin who > didn't say. Merlin? Just noticed this. redhat-release: "Red Hat Enterprise Linux Server release 6.9 (Santiago)" merlin
Re: Hstore OID bigger than an integer
On Fri, Aug 23, 2019 at 9:26 AM Roberto Mireles wrote: > > Hello team, > > This is the first time I post here, if you can provide some help, would be > much appreciated. > > I have an application that can not access the database due to OID value for > hstore extension is bigger than an integer value. Application uses a NpgSql > driver that only supports integer types for OIDs. > > We have a new app version, which uses a new driver version that supports > bigint and has no issues at all, but for that specific database, we still > need to use that old version of the app. > > I have searched for a workaround or something that can help us to be able to > connect to the database, but have not found anything related. > > I also tried by dropping, creating extension again, but same result. > > Does any of you know any workaround that can help here? A full dump/restore of the database (via pg_dump) might work, as long as the oids are not dumped with the database, which I believe to be the default. This ought to reset the oid counter. Some while back, oids were changed so the counter was table specific (IIRC). Not sure was after 9.2. If it was, upgrading the database (which you should be looking at anyways) might help. Also, raise an issue upstream. merlin
Re: double free in ExecHashJoin, 9.6.12
On Wed, Jul 24, 2019 at 11:01 PM Thomas Munro wrote: > > On Thu, Jul 25, 2019 at 2:39 AM Merlin Moncure wrote: > > Server is generally running pretty well, and is high volume. This > > query is not new and is also medium volume. Database rebooted in > > about 4 seconds with no damage; fast enough we didn't even trip alarms > > (I noticed this troubleshooting another issue). We are a couple of > > bug fixes releases behind but I didn't see anything obvious in the > > release notes suggesting a resolved issue. Anyone have any ideas? > > thanks in advance. > > > postgres: rms ysconfig 10.33.190.21(36788) > > SELECT(ExecHashJoin+0x5a2)[0x5e2d32] > > Hi Merlin, > > Where's the binary from (exact package name, if installed with a > package)? Not sure if this is going to help, but is there any chance > you could disassemble that function so we can try to see what it's > doing at that offset? For example on Debian if you have > postgresql-9.6 and postgresql-9.6-dbg installed you could run "gdb > /usr/lib/postgresql/9.6/bin/postgres" and then "disassemble > ExecHashJoin". The code at "<+1442>" (0x5a2) is presumably calling > free or some other libc thing (though I'm surprised not to see an > intervening palloc thing). Thanks -- great suggestion. I'll report back with any interesting findings. merlin
double free in ExecHashJoin, 9.6.12
Server is generally running pretty well, and is high volume. This query is not new and is also medium volume. Database rebooted in about 4 seconds with no damage; fast enough we didn't even trip alarms (I noticed this troubleshooting another issue). We are a couple of bug fixes releases behind but I didn't see anything obvious in the release notes suggesting a resolved issue. Anyone have any ideas? thanks in advance. merlin *** glibc detected *** postgres: rms ysconfig 10.33.190.21(36788) SELECT: double free or corruption (!prev): 0x01fb2140 *** === Backtrace: = /lib64/libc.so.6(+0x75dee)[0x7f4fde053dee] /lib64/libc.so.6(+0x78c80)[0x7f4fde056c80] postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecHashJoin+0x5a2)[0x5e2d32] postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecProcNode+0x208)[0x5cf728] postgres: rms ysconfig 10.33.190.21(36788) SELECT(standard_ExecutorRun+0x18a)[0x5cd1ca] postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e5607] postgres: rms ysconfig 10.33.190.21(36788) SELECT(PortalRun+0x188)[0x6e67d8] postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e2af3] postgres: rms ysconfig 10.33.190.21(36788) SELECT(PostgresMain+0x75a)[0x6e456a] postgres: rms ysconfig 10.33.190.21(36788) SELECT(PostmasterMain+0x1875)[0x6840b5] postgres: rms ysconfig 10.33.190.21(36788) SELECT(main+0x7a8)[0x60b528] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f4fddffcd1d] postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x46c589] 2019-07-23 09:41:41 CDT[:@]:LOG: server process (PID 18057) was terminated by signal 6: Aborted 2019-07-23 09:41:41 CDT[:@]:DETAIL: Failed process was running: SELECT JR.job_id as jobId,JR.job_execution_id as jobResultId,JR.created as lastRunDate, JR.status as status, JR.status_message as statusMessage, JR.output_format as outputFormat, JS.schedule_name as scheduleName, JS.job_name as reportName, JS.created_by as scheduledBy, JS.product as source FROM (SELECT JR.job_id, MAX(JR.created) AS MaxCreated FROM job_schedule JS JOIN job_result JR ON JR.job_id=JS.job_id WHERE (lower(JS.recepients) like lower($1) OR lower(JS.created_by) = lower($2)) GROUP BY JR.job_id) TMP JOIN job_result JR ON JR.job_id = TMP.job_id AND JR.created = TMP.MaxCreated JOIN job_schedule JS ON JS.job_id = JR.job_id AND JS.job_type='CRON' merlin
Re: crosstab/repivot...any interest?
On Tue, Feb 26, 2019 at 8:31 AM Joe Conway wrote: > On 2/25/19 8:34 PM, Merlin Moncure wrote: > > The interface I'm looking at is: > > SELECT repivot( > > query TEXT, > > static_attributes INT, /* number of static attributes that are > > unchanged around key; we need this in our usages */ > > attribute_query TEXT); /* query that draws up the pivoted attribute > > list */ > > > > The driving query is expected to return 0+ static attributes which are > > essentially simply pasted to the output. The next two columns are the > > key column and the pivoting column. So if you had three attributes, > > the input set would be: > > > > a1, a2, a3, k1, p, v1...vn > > > > Where the coordinates v and p would exchange. I need to get this done > > quickly and so am trying to avoid more abstracted designs (maybe multi > > part keys should be supported through...this is big limitation of > > crosstab albeit with some obnoxious work arounds). > > Perhaps not enough coffee yet, but I am not sure I fully grok the design > here. A fully fleshed out example would be useful. Thanks for your interest. A lot of our application data is organized in columnar type formats. Typically, the analytical reports on the back of the database are organized withe output columns being a KPI or a unit of time. KPI Columns: Country,Make,Model,TimeSlice,Sales,Units US,Ford,Fusion,Y2018Q1,165MM$,250k US,Ford,Fusion,Y2018Q2,172MM$,261k US,Ford,Fusion,Y2018Q3,183MM$,268k ... Time Columns: Country,Make,Mode, KPI,Y2018Q1,Y2018Q2,Y2018Q3 US,Ford,Fusion,Y2018Q1,Sales,165MM$,172MM$,183MM$ US,Ford,Fusion,Y2018Q2,Units,250k,261k,268k SELECT repivot( , 1, /* only one static attribute */ 2, /* number of key columns, only needed if multi part keys are supported */ ); In this example supporting multi-part, the repivot column is the 4th, assumed to be attributes first, keys second, pivot column third, data block last. Multi column pivots might be considered but I don't need them and that's a significant expansion in scope, so I'm avoiding that consideration. What we need to do is convert from the first format above (which is how data is materialized on table) to the second format. Just like as within crosstab, if the key column(s) are ordered into the function we can exploit that structure for an efficient implementation. 'Ford' and 'Fusion' are key columns; 'US' is uninteresting attribute column (it is unambiguously represented by 'Ford', 'Fusion') and so would be simply be copied from input to output set. Our data is materialized in KPI style (which is pretty typical) but we have large analytical reports that want it with columns representing units of time. This is farily common in the industry IMO. There are various pure SQL approaches to do this but they tend to force you to build out the X,Y columns and then join it all back together; this can spiral out of control quite quickly from a performance standpoint. A crosstab style crawling cursor over ordered set ought to get the job done very efficiently. tablefunc doesn't support multi part keys today, and the workaround is that you have so stuff a key into a formatted string and than parse it out for downstream joining, some needs of having to do that joining might hopefully be eliminated by allowing the attribute columns to be copied through. merlin
crosstab/repivot...any interest?
On Fri, Jan 25, 2019 at 9:14 PM Morris de Oryx wrote: > > Hello, I'm not a C coder and can't helpbut I love cross-tab/pivot tables. They're the best, and just fantastic for preparing data to feed into various analysis tools. The tablefunc module is helpful, but a bit awkward to use (as I remember it.) > > From a user's point of view, I high-performance cross-tab generator would be just fantastic. > > As I understand it, this is what's involved in a pivot: > > 1. Identify rows that should be grouped (consolidated.) > 2. Distinguish the value that identifies each derived column. > 3. Distinguish the value that identifies each row-column value. > 4. Collapse the rows, build the columns, and populate the 'cells' with data. > > In an ideal world, you would be able to perform different grouping operations. Such as count, sum, avg, etc. > > If there's a way to do this in a system-wide and standards-pointing way, so much the better. > > Apologies if I'm violating list etiquette by jumping in here. I've been lurking on several Postgres lists for a bit and picking up interesting details every day. If I've been Unintentionally and Cluelessly Off, I'm find with being told. No worries, sir! Apologies on the late reply. I've made some headway on this item. Waiting for postgres to implement the SQL standard pivoting (regardless if it implements the cases I need) is not an option for my personal case. I can't use the SQL approach either as it's very slow and imposing some scaling limits that need to work around in the short run. My strategy is to borrow [steal] from crosstab_hash and make a new version called repivot which takes an arleady pivoted data set and repivots it against an identified column. Most of the code can be shared with tablefunc so ideally this could live as an amendment to that extension. That might not happen though, so I'm going to package it as a separate extension (removing the majority of tablefunc that I don't need) and submit it to this group for consideration. If we punt, it'll end up as a private extension or live the life of an Orphan in pgxn. If there's some interest here, we can consider a new contrib extension (which I personally rate very unlikely) or recasting as an extra routine to tablefunc. Any way we slice it, huge thanks to Joe Conway for giving us such an awesome function to work with all these years (not to mention the strategic plr language). SRF crosstab() is still somewhat baroque, but it still fills a niche that nothing else implements. The interface I'm looking at is: SELECT repivot( query TEXT, static_attributes INT, /* number of static attributes that are unchanged around key; we need this in our usages */ attribute_query TEXT); /* query that draws up the pivoted attribute list */ The driving query is expected to return 0+ static attributes which are essentially simply pasted to the output. The next two columns are the key column and the pivoting column. So if you had three attributes, the input set would be: a1, a2, a3, k1, p, v1...vn Where the coordinates v and p would exchange. I need to get this done quickly and so am trying to avoid more abstracted designs (maybe multi part keys should be supported through...this is big limitation of crosstab albeit with some obnoxious work arounds). merlin
Re: Early WIP/PoC for inlining CTEs
On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera wrote: > > On 2019-Feb-14, Peter Eisentraut wrote: > > > On 14/02/2019 16:11, Tom Lane wrote: > > > ... so, have we beaten this topic to death yet? Can we make a decision? > > > > > > Personally, I'd be happy with either of the last two patch versions > > > I posted (that is, either AS [[NOT] MATERIALIZED] or > > > AS [MATERIALIZE [ON|OFF]] syntax). But we gotta pick something. > > > > If we're not really planning to add any more options, I'd register a > > light vote for MATERIALIZED. It reads easier, seems more grammatically > > correct, and uses an existing word. > > +1 for MATERIALIZED, as I proposed in > https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql Seconded! merlin
Re: Early WIP/PoC for inlining CTEs
On Tuesday, January 29, 2019, Tom Lane wrote: > Merlin Moncure writes: > > On Tue, Jan 29, 2019 at 1:53 PM Tom Lane wrote: > >> I propose that we implement and document this as > >> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query ) > > > I hate to bikeshed here, but I think it's better english using that > > style of syntax to say, > > WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query ) > > Hm. Doesn't really seem to fit with our style for options elsewhere; > for instance, EXPLAIN's options are things like ANALYZE ON/OFF and > VERBOSE ON/OFF, not ANALYSIS and VERBOSITY. > > regards, tom lane > Yep...I'll concede the point. merlin
Re: Early WIP/PoC for inlining CTEs
On Tue, Jan 29, 2019 at 1:53 PM Tom Lane wrote: > > Michael Paquier writes: > > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote: > >> Yeah, I thought about that too, but it doesn't seem like an improvement. > >> If the query is very long (which isn't unlikely) I think people would > >> prefer to see the option(s) up front. > > > Having these options at the front of the WITH clause looks more > > natural to me. > > Well, we've managed to get agreement on the semantics of this thing, > let's not get hung up on the syntax details. > > I propose that we implement and document this as > > WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query ) > > which is maybe a bit clunky but not awful, and it would leave room > to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we > ever need to. Looking at the precedent of e.g. EXPLAIN, we could > probably allow just "MATERIALIZE" as well, with the boolean value > defaulting to true. I hate to bikeshed here, but I think it's better english using that style of syntax to say, WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query ) merlin
Re: crosstab/repivot...any interest?
On Fri, Jan 25, 2019 at 3:16 PM David Fetter wrote: > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > > Hackers, > > > > We have a strong need to make a variant to the crosstab interface so > > that data that is pivoted one way would be sent through a crosstab > > like function so that it would be pivoted another way. For example, > > if you had > > > > row 0: a1, a2, a3, k1, c1, c2, ... > > row 1: a1, a2, a3, k2, c1, c2, ... > > row 2: a1, a2, a3, k3, c1, c2, ... > > ... > > > > where 'a' columns are uninteresting attribute columns, 'k' is the > > dimension we want to pivot on, and c1->cN would be stacked vertically, > > so that we'd end up with, > > row 0: a1, a2, a3, c1, k1, k2, ... > > row 1: a1, a2, a3, c2, k1, k2, ... > > row 2: a1, a2, a3, c3, k1, k2, ... > > > > There are various SQL level approaches to this but they tend to be > > imperformant with large datasets so that I think a crosstab-like C > > implementation ought to be able to do better (or at least I hope so) > > since you have to cross product rows and columns in such a way that > > you can get a clean join. Cribbing from tablefunc.c I don't think > > this is a terrible challenge to do in hash table style. > > > > Questions on the table: > > *) Has anyone done anything like this or know of any current > > implementations? > > *) Would there be any interest in expanding tablefunc along these lines? > > There's something in SQL:2016 that I read as crosstabs, or at least as > enabling crosstabs. > https://www.iso.org/standard/69776.html > > If we're going to put work into crosstabs, it seems to me that the > "we" needs to be the project as a whole, and the work should be, to > the extent reasonable, toward standard compliance. Interesting. Do you see that the spec (it makes my brain hurt) can handle that kind of repivoting? merlin
crosstab/repivot...any interest?
Hackers, We have a strong need to make a variant to the crosstab interface so that data that is pivoted one way would be sent through a crosstab like function so that it would be pivoted another way. For example, if you had row 0: a1, a2, a3, k1, c1, c2, ... row 1: a1, a2, a3, k2, c1, c2, ... row 2: a1, a2, a3, k3, c1, c2, ... ... where 'a' columns are uninteresting attribute columns, 'k' is the dimension we want to pivot on, and c1->cN would be stacked vertically, so that we'd end up with, row 0: a1, a2, a3, c1, k1, k2, ... row 1: a1, a2, a3, c2, k1, k2, ... row 2: a1, a2, a3, c3, k1, k2, ... There are various SQL level approaches to this but they tend to be imperformant with large datasets so that I think a crosstab-like C implementation ought to be able to do better (or at least I hope so) since you have to cross product rows and columns in such a way that you can get a clean join. Cribbing from tablefunc.c I don't think this is a terrible challenge to do in hash table style. Questions on the table: *) Has anyone done anything like this or know of any current implementations? *) Would there be any interest in expanding tablefunc along these lines? thanks in advance, merlin
Re: Early WIP/PoC for inlining CTEs
On Tue, Jul 24, 2018 at 6:10 PM Andres Freund wrote: > My read of the concensus (in which I am in the majority, so I might be > biased) is that we do want inlining to be the default. We were thinking > that it'd be necessary to provide a way to force inlining on the SQL > level for individual CTEs. This is correct. Suggesting that we need syntax to disabling inlining at the CTE level, and/or GUC to control the behavior (which I agree should be defualted to inline). Something like enable_cte_inline=true; I'm not very enthusiastic about explicitly breaking intentionally introduced optimization fences and then forcing people to inject our OFFSET 0 hack. This is just too unpleasant to contemplate...what happens if we come up with a better implemntation of OFFSET? yuck. Thanks for providing this, CTE plan problems are a real bugaboo. merlin
Re: WIP Patch: Add a function that returns binary JSONB as a bytea
On Fri, Nov 2, 2018 at 5:15 PM Andrew Dunstan wrote: > On 11/02/2018 05:20 PM, Andres Freund wrote: > > Hi, > > > > On 2018-11-02 17:02:24 -0400, Andrew Dunstan wrote: > >> On 11/02/2018 11:34 AM, Merlin Moncure wrote: > >>> Binary format consuming applications already have to deal with these > >>> kinds of issues. We already expose internal structures in the other > >>> functions -- not sure why jsonb is held to a different standard. For > >>> other data types where format changes were made, the standard of > >>> 'caveat version' was in place to protect the user. For jsonb we > >>> decided to implement a version flag within the type itself, which I > >>> thought mistake at the time -- better to have a version header in the > >>> COPY BINARY if needed. > >>> > >> > >> jsonb_send does output a version header, as I pointed out upthread. > > That's Merlin's point I think. For reasons I don't quite understand he > > doesn't like that. Yes, a global solution would have been prettier than > > per-datum version flag, but that obvioulsy wasn't realistic to introduce > > around the feature freeze of the version that introduced jsonb. > > > Oh, hmm. It would have been a big change of little value, ISTM. One byte > of overhead per jsonb datum for a version indicator doesn't seem a huge > price to pay. Yeah -- it really isn't. My concern was more that it seemed odd to protect one type with a version flag where others aren't; format agreement strikes me as more of a protocol negotiation thing than an aspect of each individual data point. It also makes for slightly odd (IMO) client side coding. The contract is of concern, not the overhead. It works well enough though...we discussed this a bit when the header was introduced and the discussion ended exactly the same way :-). merlin
Re: WIP Patch: Add a function that returns binary JSONB as a bytea
On Fri, Nov 2, 2018 at 2:34 PM Stephen Frost wrote: > * Merlin Moncure (mmonc...@gmail.com) wrote: > As for what language it's written in- I don't think that matters much. > I'd very much expect it to be more performant to use binary if you're > working in C, of course, but there's no point comparing C-parsed json > into some C structure vs. psycopg2 injesting binary data and building a > Python json object- what matters is if it'd be faster for psycopg2 to > pull in binary-json data and put it into a Python json object, or if > it'd be faster to parse the text-json data and put the result into the > same Python object. In my view, there's something clearly quite wrong > if the text-json data format is faster at that. Yeah, I figure the language would be C or another language with drivers smart enough to speak the binary protocol. > > In my experience with arrays and composites, you can see significant > > performance reduction and throughput increase in certain classes of > > queries. However, some of the types that were the worst offenders > > (like timestamps) have been subsequently optimized and/or are > > irrelevant to json since they'd be passed as test anyways. > > I've had very good success transferring timestamps as binary, so I'm not > quite sure what you're getting at here. Er, yes, timestamps are much faster in binary -- that is what I'd observed and was the point I was trying to make. They are slightly less faster with recent optimizations but still faster. In the old days I saw as much as +50% throughput binary vs text in certain contrived situations; that may no longer be true today. merlin
Re: WIP Patch: Add a function that returns binary JSONB as a bytea
On Fri, Nov 2, 2018 at 11:15 AM Stephen Frost wrote: > > Greetings, > > * Merlin Moncure (mmonc...@gmail.com) wrote: > > On Fri, Nov 2, 2018 at 10:53 AM Tom Lane wrote: > > > Andres' point about alignment is a pretty good one as well, if it applies > > > here --- I don't recall just what internal alignment requirements jsonb > > > has. We have not historically expected clients to have to deal with that. > > > > I see your (and Andres') point; the binary wire format ought to lay on > > top of the basic contracts established by other types. It can be > > binary; just not a straight memcpy out of the server. The array and > > composite type serializers should give some inspiration there on > > serialization. > > Right- I agree w/ Tom and Andres on this part also. > > > I'll still stand other point I made though; I'd > > really want to see some benchmarks demonstrating benefit over > > competing approaches that work over the current formats. That should > > frame the argument as to whether this is a good idea. > > What are the 'competing approaches' you're alluding to here? Sending > text-format json across as we do today? Yep -- exactly. For example, write a C client program that recursed the structure and dumped it to stdout or assigned to dummy variables (being mindful of compiler optimizations). I'd be contrasting this to a C parsed json that did essentially the same thing, and rigging a high scale test on the back of that. The assumption here is that the ultimate consumer is not, say, a browser, but some client app that can actually exploit the performance advantages (else, why bother?). In my experience with arrays and composites, you can see significant performance reduction and throughput increase in certain classes of queries. However, some of the types that were the worst offenders (like timestamps) have been subsequently optimized and/or are irrelevant to json since they'd be passed as test anyways. merlin
Re: WIP Patch: Add a function that returns binary JSONB as a bytea
On Fri, Nov 2, 2018 at 10:53 AM Tom Lane wrote: > Merlin Moncure writes: > > On Wed, Oct 31, 2018 at 10:23 AM Andres Freund wrote: > >> It's entirely unacceptable afaict. Besides the whole "exposing > >> internals" issue, it's also at least not endianess safe, depends on the > >> local alignment requirements (which differ both between platforms and > >> 32/64 bit), numeric's internal encoding and probably more. > > > Binary format consuming applications already have to deal with these > > kinds of issues. We already expose internal structures in the other > > functions -- not sure why jsonb is held to a different standard. > > I don't think it's being held to a different standard at all. Even for > data as simple as integers/floats, we convert to uniform endianness on the > wire. Moreover, we do not expose the exact bits for anything more complex > than machine floats. Numeric, for instance, gets disassembled into fields > rather than showing the exact header format (let alone several different > header formats, as actually found on disk). > > Andres' point about alignment is a pretty good one as well, if it applies > here --- I don't recall just what internal alignment requirements jsonb > has. We have not historically expected clients to have to deal with that. I see your (and Andres') point; the binary wire format ought to lay on top of the basic contracts established by other types. It can be binary; just not a straight memcpy out of the server. The array and composite type serializers should give some inspiration there on serialization. I'll still stand other point I made though; I'd really want to see some benchmarks demonstrating benefit over competing approaches that work over the current formats. That should frame the argument as to whether this is a good idea. merlin
Re: WIP Patch: Add a function that returns binary JSONB as a bytea
On Wed, Oct 31, 2018 at 10:23 AM Andres Freund wrote: > > Hi, > > On 2018-10-31 11:13:13 -0400, Andrew Dunstan wrote: > > I agree that just sending a blob of the internal format isn't a great idea. > > It's entirely unacceptable afaict. Besides the whole "exposing > internals" issue, it's also at least not endianess safe, depends on the > local alignment requirements (which differ both between platforms and > 32/64 bit), numeric's internal encoding and probably more. Binary format consuming applications already have to deal with these kinds of issues. We already expose internal structures in the other functions -- not sure why jsonb is held to a different standard. For other data types where format changes were made, the standard of 'caveat version' was in place to protect the user. For jsonb we decided to implement a version flag within the type itself, which I thought mistake at the time -- better to have a version header in the COPY BINARY if needed. People using binary format are basically interested one one thing, performance. It's really the fastest way to get data in an out of the database. For my part, I'd like to see some observed demonstrable benefit in terms of performance to justify making the change. merlin
Re: transction_timestamp() inside of procedures
On Wed, Sep 26, 2018 at 10:55 AM Alvaro Herrera wrote: > > On 2018-Sep-26, Tom Lane wrote: > > > Alvaro Herrera writes: > > > On 2018-Sep-26, Tom Lane wrote: > > >> I agree that it would be surprising for transaction timestamp to be newer > > >> than statement timestamp. So for now at least, I'd be satisfied with > > >> documenting the behavior. > > > > > Really? I thought it was practically obvious that for transaction- > > > controlling procedures, the transaction timestamp would not necessarily > > > be aligned with the statement timestamp. The surprise would come > > > together with the usage of the new feature, so existing users would not > > > be surprised in any way. > > > > Nope. That's the same poor reasoning we've fallen into in some other > > cases, of assuming that "the user" is a point source of knowledge. > > But DBMSes tend to interact with lots of different code. If some part > > of application A starts using intraprocedure transactions, and then > > application B breaks because it wasn't expecting to see xact_start > > later than query_start in pg_stat_activity, you've still got a problem. > > While that's true, I think it's also highly hypothetical. > > What could be the use for the transaction timestamp? I think one of the > most important uses (at least in pg_stat_activity) is to verify that > transactions are not taking excessively long time to complete; +1 I think the existing behavior is broken, and extremely so. Transaction timestamp has a very clear definition to me. I'm in planning to move a lot of code into stored procedures from bash, and upon doing so it's going to trip all kinds of nagios alarms that are looking at the longest running transaction. merlin
Re: Stored procedures and out parameters
On Mon, Sep 17, 2018 at 7:45 AM Jonathan S. Katz wrote: > > Hi, > > On 9/2/18 4:32 PM, Robert Haas wrote: > > On Thu, Aug 30, 2018 at 4:14 PM, Dave Cramer wrote: > >> Reading this from the (JDBC) drivers perspective, which is probably a > >> fairly > >> popular one, > >> We now have a standard that we can't really support. Either the driver will > >> have to support > >> the new PROCEDURE with the {call } mechanism or stay with the existing > >> FUNCTIONS. > >> This puts the drivers in a no win situation. > > > > I understand and I agree. > > > >> Undoubtedly.. surely the opportunity to do something about this has not > >> passed as this has not been > >> officially released ? > > > > I agree with that, too, but I can't make other people do things they > > don't want to do, and then there's the question of time. On the basis > > of previous experience, there is going to be little appetite for > > holding up the release to address this issue no matter how badly > > busted it is. Ultimately, it's the RMT that must decide what to do in > > cases like this. I have confidence that they are watching this > > thread, but I don't know what they will decide to do about it. > > > > First, I want everyone to know that the RMT took this very seriously and > took time collect feedback and consult with as many people as we could > in order to make the best possible decision for v11 and ensure that any > decision we made did not hinder any future implementation for stored > procedures nor introduced something that would break backwards > compatibility. > > Ultimately, the decision came down to one of four options: > > #1: Do nothing and remove the open item > #2: Introduce nonstandard syntax for calling functions / procedures > #3: Have CALL support both functions & procedures (or SELECT support > calling functions) > #4: Disable CALL > > The RMT has decided to go with option #1 and will be removing the open > item for the PostgreSQL 11 release. Hooray for making the right choice. This is exhibit "Z" as to why abstracting the SQL language behind a programming API can lead to problems. The workaround is to simply not do that and you can get precise control of behavior. It's a little unfortunate that API {call} maps to a select but such is life. merlin
Re: I'd like to discuss scaleout at PGCon
On Fri, Jun 22, 2018 at 12:34 PM Bruce Momjian wrote: > > On Fri, Jun 1, 2018 at 11:29:43AM -0500, Merlin Moncure wrote: > > FWIW, Distributed analytical queries is the right market to be in. > > This is the field in which I work, and this is where the action is at. > > I am very, very, sure about this. My view is that many of the > > existing solutions to this problem (in particular hadoop class > > soltuions) have major architectural downsides that make them > > inappropriate in use cases that postgres really shines at; direct > > hookups to low latency applications for example. postgres is > > fundamentally a more capable 'node' with its multiple man-millennia of > > engineering behind it. Unlimited vertical scaling (RAC etc) is > > interesting too, but this is not the way the market is moving as > > hardware advancements have reduced or eliminated the need for that in > > many spheres. > > > > The direction of the project is sound and we are on the cusp of the > > point where multiple independent coalescing features (FDW, logical > > replication, parallel query, executor enhancements) will open new > > scaling avenues that will not require trading off the many other > > benefits of SQL that competing contemporary solutions might. The > > broader development market is starting to realize this and that is a > > major driver of the recent upswing in popularity. This is benefiting > > me tremendously personally due to having gone 'all-in' with postgres > > almost 20 years ago :-D. (Time sure flies)These are truly > > wonderful times for the community. > > While I am glad people know a lot about how other projects handle > sharding, these can be only guides to how Postgres will handle such > workloads. I think we need to get to a point where we have all of the > minimal sharding-specific code features done, at least as > proof-of-concept, and then test Postgres with various workloads like > OLTP/OLAP and read-write/read-only. This will tell us where > sharding-specific code will have the greatest impact. > > What we don't want to do is to add a bunch of sharding-specific code > without knowing which workloads it benefits, and how many of our users > will actually use sharding. Some projects have it done that, and it > didn't end well since they then had a lot of product complexity with > little user value. Key features from my perspective: *) fdw in parallel. how do i do it today? ghetto implemented parallel queries with asynchronous dblink *) column store *) automatic partition management through shards probably some more, gotta run :-) merlin
Re: I'd like to discuss scaleout at PGCon
On Wed, May 30, 2018 at 9:26 PM Robert Haas wrote: > The FDW approach, of which I have been a supporter for some years now, > is really aiming at a different target, which is to allow efficient > analytics queries across a multi-node cluster. I think we're getting > pretty close to being able to do that -- IMHO, the last fundamental > building block that we need is asynchronous execution, which Andres is > working on. After that, it's a matter of adding other features that > people want (like cross-node MVCC) and improving the plans for queries > that still don't perform well (like joins that involve redistributing > one of the tables involved). FWIW, Distributed analytical queries is the right market to be in. This is the field in which I work, and this is where the action is at. I am very, very, sure about this. My view is that many of the existing solutions to this problem (in particular hadoop class soltuions) have major architectural downsides that make them inappropriate in use cases that postgres really shines at; direct hookups to low latency applications for example. postgres is fundamentally a more capable 'node' with its multiple man-millennia of engineering behind it. Unlimited vertical scaling (RAC etc) is interesting too, but this is not the way the market is moving as hardware advancements have reduced or eliminated the need for that in many spheres. The direction of the project is sound and we are on the cusp of the point where multiple independent coalescing features (FDW, logical replication, parallel query, executor enhancements) will open new scaling avenues that will not require trading off the many other benefits of SQL that competing contemporary solutions might. The broader development market is starting to realize this and that is a major driver of the recent upswing in popularity. This is benefiting me tremendously personally due to having gone 'all-in' with postgres almost 20 years ago :-D. (Time sure flies)These are truly wonderful times for the community. merlin
Re: [HACKERS] Clock with Adaptive Replacement
On Wed, May 9, 2018 at 11:00 AM Robert Haaswrote: > Independently of that, it would be probably also useful to avoid > bumping the reference count multiple times when a buffer is accessed > by the same backend several times in quick succession. Perhaps this > could even be as simple as maintaining a list of the last two buffer > IDs for which we bumped the usage count; if we see one of them again, > don't bump the usage count again. Hm. Is the objective here to optimize access patterns or to reduce atomic operations (or both)? All else being equal, an algorithm that delivers the similar eviction results with less cache synchronization ought to be preferred...are the various algorithms scored in that way? merlin
Re: Built-in connection pooling
On Fri, May 4, 2018 at 2:25 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure <mmonc...@gmail.com> wrote: >> If we are breaking 1:1 backend:session relationship, what controls >> would we have to manage resource consumption? > > I mean, if you have a large number of sessions open, it's going to > take more memory in any design. If there are multiple sessions per > backend, there may be some possibility to save memory by allocating it > per-backend rather than per-session; it shouldn't be any worse than if > you didn't have pooling in the first place. It is absolutely worse, or at least can be. plpgsql plan caches can be GUC dependent due to search_path; you might get a different plan depending on which tables resolve into the function. You might rightfully regard this as an edge case but there are other 'leakages', for example, sessions with different planner settings obviously ought not to share backend plans. Point being, there are many interdependent things in the session that will make it difficult to share some portions but not others. > However, I think that's probably worrying about the wrong end of the > problem first. IMHO, what we ought to start by doing is considering > what a good architecture for this would be, and how to solve the > general problem of per-backend session state. If we figure that out, > then we could worry about optimizing whatever needs optimizing, e.g. > memory usage. Exactly -- being able to manage down resource consumption by controlling session count is a major feature that ought not to be overlooked. So I'm kind of signalling that if given a choice between that (funneling a large pool of connections down to a smaller number of backends) and externalized shared sessions I'd rather have the funnel; it solves a number of very important problems with respect to server robustness. So I'm challenging (in a friendly, curious way) if breaking session:backend 1:1 is really a good idea. Maybe a connection pooler implementation can do both of those things or it's unfair to expect an implementation to do both of them. merlin
Re: Built-in connection pooling
On Thu, May 3, 2018 at 12:01 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure <mmonc...@gmail.com> wrote: >> What _I_ (maybe not others) want is a >> faster pgbouncer that is integrated into the database; IMO it does >> everything exactly right. > > I have to admit that I find that an amazing statement. Not that > pgbouncer is bad technology, but saying that it does everything > exactly right seems like a vast overstatement. That's like saying > that you don't want running water in your house, just a faster motor > for the bucket you use to draw water from the well. Well you certainly have a point there; I do have a strong tendency for overstatement :-). Let's put it like this: being able to have connections funnel down to a smaller number of sessions is nice feature. Applications that are large, complex, or super high volume have a tendency towards stateless (with respect to the database session) architecture anyways so I tend not to mind lack of session features when pooling (prepared statements perhaps being the big outlier here). It really opens up a lot of scaling avenues. So better a better phrased statement might be, "I like the way pgbouncer works, in particular transaction mode pooling from the perspective of the applications using it". Current main pain points are the previously mentioned administrative headaches and better performance from a different architecture (pthreads vs libev) would be nice. I'm a little skeptical that we're on the right path if we are pushing a lot of memory consumption into the session level where a session is pinned all the way back to a client connection. plpgsql function plan caches can be particularly hungry on memory and since sessions have their own GUC ISTM each sessions has to have their own set of them since plans depend on search path GUC which is session specific. Previous discussions on managing cache memory consumption (I do dimly recall you making a proposal on that very thing) centrally haven't gone past panning stages AFAIK. If we are breaking 1:1 backend:session relationship, what controls would we have to manage resource consumption? merlin
Re: Built-in connection pooling
On Fri, Apr 27, 2018 at 11:44 AM, Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > > > On 27.04.2018 18:33, Merlin Moncure wrote: >> On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik >> <k.knizh...@postgrespro.ru> wrote: >>> On 27.04.2018 16:49, Merlin Moncure wrote: >> I'm confused here...could be language issues or terminology (I'll look >> at your latest code). Here is how I understand things: >> Backend=instance of postgres binary >> Session=application state within postgres binary (temp tables, >> prepared statement etc) >> Connection=Client side connection > > Backend is a process, forked by postmaster. right, we are saying the same thing here. >> AIUI (I could certainly be wrong), withing connection pooling, ratio >> of backend/session is still 1:1. The idea is that client connections >> when they issue SQL to the server reserve a Backend/Session, use it >> for the duration of a transaction, and release it when the transaction >> resolves. So many client connections share backends. As with >> pgbouncer, the concept of session in a traditional sense is not really >> defined; session state management would be handled within the >> application itself, or within data within tables, but not within >> backend private memory. Does that align with your thinking? > > No. Number of sessions is equal to number of client connections. > So client is not reserving "Backend/Session" as it happen in pgbouncer. > One backend keeps multiple sessions. And for each session it maintains > session context which included client's connection. > And it is backend's decision transaction of which client it is going to > execute now. > This is why built-in pooler is able to provide session semantic without > backend/session=1:1 requirement. I see. I'm not so sure that is a good idea in the general sense :(. Connection sharing sessions is normal and well understood, and we have tooling to manage that already (DISCARD). Having the session state abstracted out and pinned to the client connection seems complex and wasteful, at least sometimes. What _I_ (maybe not others) want is a faster pgbouncer that is integrated into the database; IMO it does everything exactly right. merlin
Re: Built-in connection pooling
On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > On 27.04.2018 16:49, Merlin Moncure wrote: >> *) How are you pinning client connections to an application managed >> transaction? (IMNSHO, this feature is useless without being able to do >> that) > > Sorry, I do not completely understand the question. > Rescheduling is now done at transaction level - it means that backand can > not be switched to other session until completing current transaction. > The main argument for transaction level pooling is that it allows not worry > about heavy weight locks, which are associated with procarray entries. I'm confused here...could be language issues or terminology (I'll look at your latest code). Here is how I understand things: Backend=instance of postgres binary Session=application state within postgres binary (temp tables, prepared statement etc) Connection=Client side connection AIUI (I could certainly be wrong), withing connection pooling, ratio of backend/session is still 1:1. The idea is that client connections when they issue SQL to the server reserve a Backend/Session, use it for the duration of a transaction, and release it when the transaction resolves. So many client connections share backends. As with pgbouncer, the concept of session in a traditional sense is not really defined; session state management would be handled within the application itself, or within data within tables, but not within backend private memory. Does that align with your thinking? merlin
Re: Built-in connection pooling
On Thu, Apr 26, 2018 at 6:04 AM, Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > On 25.04.2018 20:02, Merlin Moncure wrote: >> Yep. The main workaround today is to disable them. Having said that, >> it's not that difficult to imagine hooking prepared statement creation >> to a backend starting up (feature: run X,Y,Z SQL before running user >> queries). > > Sorry, I do not completely understand your idea. > Yes, it is somehow possible to simulate session semantic by prepending all > session specific commands (mostly setting GUCs) to each SQL statements. > But it doesn't work for prepared statements: the idea of prepared statements > is that compilation of statement should be done only once. The idea is that you have arbitrary SQL that runs when after the backend (postgres binary) is forked from postmaster. This would be an ideal place to introduce prepared statements in a way that is pooling compatible; you still couldn't PREPARE from the application but you'd be free to call already prepared statements (via SQL level EXECUTE or libpq PQexecPrepared()). Of course, if somebody throws a DEALLOCATE or DISCARD ALL, or issues a problematic DROP x CASCADE, you'd be in trouble but that'a not a big deal IMO because you can control for those things in the application. > Database performance is mostly limited by disk, so optimal number of > backends may be different from number of cores. > But certainly possibility to launch "optimal" number of backends is one of > the advantages of builtin session pooling. Sure, but some workloads are cpu limited (all- or mostly- read with data < memory, or very complex queries on smaller datasets). So we would measure configure based one expectations exactly as is done today with pgbouncer. This is a major feature of pgbouncer: being able to _reduce_ the number of session states relative to the number of connections is an important feature; it isolates your database from various unpleasant failure modes such as runaway memory consumption. Anyways, I'm looking at your patch. I see you've separated the client connection count ('sessions') from the server backend instances ('backends') in the GUC. Questions: *) Should non pooled connections be supported simultaneously with pooled connections? *) Should there be multiple pools with independent configurations (yes, please)? *) How are you pinning client connections to an application managed transaction? (IMNSHO, this feature is useless without being able to do that) FYI, it's pretty clear you've got a long road building consensus and hammering out a reasonable patch through the community here. Don't get discouraged -- there is value here, but it's going to take some work. merlin
Re: Built-in connection pooling
On Wed, Apr 25, 2018 at 2:58 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Apr 25, 2018 at 10:00 AM, Merlin Moncure <mmonc...@gmail.com> wrote: >> systems. If we get that tor free I'd be all for it but reading >> Robert's email I'm skeptical there are easy wins here. So +1 for >> further R and -1 for holding things up based on full >> transparency...no harm in shooting for that, but let's look at things >> from a cost/benefit perspective (IMO). > > Also, I think it's worth considering that the kinds of failures users > will get out of anything that's not handled are really the worst kind. > If you have an application that relies on session state other than > what his patch knows how to preserve, your application will appear to > work in light testing because your connection won't actually be > swapped out underneath you -- and then fail unpredictably in > production when such swapping occurs. There will be no clear way to > tell which error messages or behavior differences are due to > limitations of the proposed feature, which ones are due to defects in > the application, and which ones might be due to PostgreSQL bugs. > They'll all look the same, and even experienced PG hackers won't Connection pooling is not a new phenomenon, and many stacks (in particular java) tend to pool connection by default. All of the problems we discuss here for the most part affect competitive solutions and I humbly submit the tradeoffs are _very_ widely understood. FWICT we get occasional reports that are simply and clearly answered. I guess there are some people dumb enough to flip GUC settings involving seemingly important things in production without testing or reading any documentation or the innumerable articles and blogs that will pop up...hopefully they are self selecting out of the industry :-). Looking at pgbouncer, they produce a chart that says, 'these features don't work, and please consider that before activating this feature' (https://wiki.postgresql.org/wiki/PgBouncer#Feature_matrix_for_pooling_modes) and that ought to be entirely sufficient to avoid that class of problems. This is very clear and simple. The main gripes with pgbouncer FWICT were relating to the postgres JDBC driver's unavoidable tendency (later fixed) to prepare 'BEGIN' causing various problems, which was a bug really (in the JDBC driver) which did in fact spill into this list. For this feature to be really attractive we'd want to simultaneously allow pooled and non-pooled connections on different ports, or even multiple pools (say, for different applications). Looking at things from your perspective, we might want to consider blocking (with error) features that are not 'pooling compatible' if they arrive through a pooled connection. merlin
Re: Built-in connection pooling
On Wed, Apr 25, 2018 at 9:43 AM, Christophe Pettus <x...@thebuild.com> wrote: > >> On Apr 25, 2018, at 07:00, Merlin Moncure <mmonc...@gmail.com> wrote: >> The limitations headaches that I suffer with pgbouncer project (which >> I love and use often) are mainly administrative and performance >> related, not lack of session based server features. > > For me, the most common issue I run into with pgbouncer (after general > administrative overhead of having another moving part) is that it works at > cross purposes with database-based sharding, as well as useful role and > permissions scheme. Since each server connection is specific to a > database/role pair, you are left with some unappealing options to handle that > in a pooling environment. Would integrated pooling help the sharding case (genuinely curious)? I don't quite have my head around the issue. I've always wanted pgbouncer to be able to do things like round robin queries to non-sharded replica for simple load balancing but it doesn't (yet) have that capability. That type of functionality would not fit into in in-core pooler AIUI. Totally agree that the administrative benefits (user/role/.conf/etc/etc) is a huge win. > The next most common problem are prepared statements breaking, which > certainly qualifies as a session-level feature. Yep. The main workaround today is to disable them. Having said that, it's not that difficult to imagine hooking prepared statement creation to a backend starting up (feature: run X,Y,Z SQL before running user queries). This might be be less effort than, uh, moving backend session state to a shareable object. I'll go further; managing cache memory consumption (say for pl/pgsql cached plans) is a big deal for certain workloads. The only really effective way to deal with that is to manage the server connection count and/or recycle server connections on intervals. Using pgbouncer to control backend count is a very effective way to deal with this problem and allowing virtualized connections to each mange there independent cache would be a step in the opposite direction. I very much like having control so that I have exactly 8 backends for my 8 core server with 8 copies of cache. Advisory locks are a completely separate problem. I suspect they might be used more than you realize, and they operate against a very fundamental subsystem of the database: the locking engine. I'm struggling as to why we would take another approach than 'don't use the non-xact variants of them in a pooling environment'. merlin
Re: Built-in connection pooling
On Wed, Apr 25, 2018 at 12:34 AM, Christophe Pettus <x...@thebuild.com> wrote: > >> On Apr 24, 2018, at 06:52, Merlin Moncure <mmonc...@gmail.com> wrote: >> Why does it have to be completely transparent? > > The main reason to move it into core is to avoid the limitations that a > non-core pooler has. The limitations headaches that I suffer with pgbouncer project (which I love and use often) are mainly administrative and performance related, not lack of session based server features. Applications that operate over a very large amount of virtual connections or engage a very high level of small transaction traffic are going to avoid session based features for a lot of other reasons anyways, at least in my experience. Probably the most useful feature I miss is async notifications, so much so that at one point we hacked pgbouncer to support them. Point being, full transparency is nice, but there are workarounds for most of the major issues and there are a lot of side channel benefits to making your applications 'stateless' (defined as state in application or database but not in between). Absent any other consideration, OP has proven to me that there is massive potential performance gains possible from moving the pooling mechanism into the database core process, and I'm already very excited about not having an extra server process to monitor and worry about. Tracking session state out of process seems pretty complicated and would probably add add complexity or overhead to multiple internal systems. If we get that tor free I'd be all for it but reading Robert's email I'm skeptical there are easy wins here. So +1 for further R and -1 for holding things up based on full transparency...no harm in shooting for that, but let's look at things from a cost/benefit perspective (IMO). merlin
Re: Built-in connection pooling
On Mon, Apr 23, 2018 at 3:14 PM, Robert Haaswrote: > In other words, transparent connection pooling is going to require > some new mechanism, which third-party code will have to know about, > for tracking every last bit of session state that might need to be > preserved or cleared. That's going to be a big project. Maybe some > of that can piggyback on existing infrastructure like > InvalidateSystemCaches(), but there's probably still a ton of ad-hoc > state to deal with. And no out-of-core pooler has a chance of > handling all that stuff correctly; an in-core pooler will be able to > do so only with a lot of work. Why does it have to be completely transparent? As long as the feature is optional (say, a .conf setting) the tradeoffs can be managed. It's a reasonable to expect to exchange some functionality for pooling; pgbouncer provides a 'release' query (say, DISCARD ALL) to be called upon release back to the pool. Having session state objects (not all of which we are talking about; advisory locks and notifications deserve consideration) 'just work' would be wonderful but ought not to hold up other usages of the feature. merlin
Re: INOUT parameters in procedures
On Tue, Mar 20, 2018 at 4:19 PM, Merlin Moncure <mmonc...@gmail.com> wrote: > A) you can't assign output variables with into: > CALL p(1) INTO i; // gives syntax error > > B) you can't assign via assignment > i := p(1); // gives error, 'use CALL' > > C) but you *can* via execute > EXECUTE 'CALL p(1)' INTO i; // this works! > > ...I'm glad 'C' works, as without that there would be no useful way to > get values out of procedures called from within other > procedures/functions as things stand today. 'A' ideally also out to > work, but I'm not sure 'B' should be expected to work since it's > really a thin layer around SELECT. What do you think? Also (sorry for spam), A procedure created via: create procedure p() as $$begin call p(); end; $$ language plpgsql; ...will segfault when called -- there ought to be a stack depth check. merlin
Re: INOUT parameters in procedures
On Tue, Mar 20, 2018 at 10:09 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > 2018-03-20 15:18 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>: >> >> postgres=# create or replace procedure p(a inout int default 7) as $$ >> >> begin return; end; $$ language plpgsql; >> >> CREATE PROCEDURE >> >> Time: 1.182 ms >> >> postgres=# call p(); >> >> a >> >> ─── >> >> 0 >> >> (1 row) >> > >> > >> > I wrote patch >> >> Confirmed this fixes the issue. > > Thanks for info You're welcome. Working with this feature some more, I noticed that: A) you can't assign output variables with into: CALL p(1) INTO i; // gives syntax error B) you can't assign via assignment i := p(1); // gives error, 'use CALL' C) but you *can* via execute EXECUTE 'CALL p(1)' INTO i; // this works! ...I'm glad 'C' works, as without that there would be no useful way to get values out of procedures called from within other procedures/functions as things stand today. 'A' ideally also out to work, but I'm not sure 'B' should be expected to work since it's really a thin layer around SELECT. What do you think? merlin
Re: INOUT parameters in procedures
On Tue, Mar 20, 2018 at 9:09 AM, Pavel Stehulewrote: >> Edit: In one case, after dropping the function and recreating it, I >> got the procedure to return 0 where it had not before, so this smells >> like a bug. >> postgres=# call p(); >> 2018-03-20 09:04:50.543 CDT [21494] ERROR: function p() does not >> exist at character 6 >> 2018-03-20 09:04:50.543 CDT [21494] HINT: No function matches the >> given name and argument types. You might need to add explicit type >> casts. >> 2018-03-20 09:04:50.543 CDT [21494] STATEMENT: call p(); >> ERROR: function p() does not exist >> LINE 1: call p(); >> ^ >> HINT: No function matches the given name and argument types. You >> might need to add explicit type casts. >> Time: 0.297 ms >> postgres=# create or replace procedure p(a inout int default 7) as $$ >> begin return; end; $$ language plpgsql; >> CREATE PROCEDURE >> Time: 1.182 ms >> postgres=# call p(); >> a >> ─── >> 0 >> (1 row) > > > I wrote patch Confirmed this fixes the issue. merlin
Re: INOUT parameters in procedures
On Wed, Feb 28, 2018 at 4:28 PM, Peter Eisentrautwrote: > This patch set adds support for INOUT parameters to procedures. > Currently, INOUT and OUT parameters are not supported. > > A top-level CALL returns the output parameters as a result row. In > PL/pgSQL, I have added special support to pass the output back into the > variables, as one would expect. > > These patches apply on top of the "prokind" patch set v2. (Tom has > submitted an updated version of that, which overlaps with some of the > changes I've made here. I will work on consolidating that soon.) I did a pull from master to play around with INOUT parameters and got some strange interactions with DEFAULT. Specifically, DEFAULT doesn't do much beyond, 'return the last supplied value given'. I'm not sure if this is expected behavior; it seems odd: postgres=# create or replace procedure p(a inout int default 7) as $$ begin return; end; $$ language plpgsql; CREATE PROCEDURE postgres=# call p(); a ─── (1 row) postgres=# call p(3); a ─── 3 (1 row) postgres=# call p(); a ─── 3 (1 row) I got null,3,3. I would have expected 7,3,7. Default arguments might remove quite some of the pain associated with having to supply bogus arguments to get the INOUT parameters working. Edit: In one case, after dropping the function and recreating it, I got the procedure to return 0 where it had not before, so this smells like a bug. postgres=# call p(); 2018-03-20 09:04:50.543 CDT [21494] ERROR: function p() does not exist at character 6 2018-03-20 09:04:50.543 CDT [21494] HINT: No function matches the given name and argument types. You might need to add explicit type casts. 2018-03-20 09:04:50.543 CDT [21494] STATEMENT: call p(); ERROR: function p() does not exist LINE 1: call p(); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. Time: 0.297 ms postgres=# create or replace procedure p(a inout int default 7) as $$ begin return; end; $$ language plpgsql; CREATE PROCEDURE Time: 1.182 ms postgres=# call p(); a ─── 0 (1 row) merlin
Re: JIT compiling with LLVM v9.0
On Thu, Jan 25, 2018 at 9:40 AM, Konstantin Knizhnikwrote: > As far as I understand generation of native code is now always done for all > supported expressions and individually by each backend. > I wonder it will be useful to do more efforts to understand when compilation > to native code should be done and when interpretation is better. > For example many JIT-able languages like Lua are using traces, i.e. query is > first interpreted and trace is generated. If the same trace is followed > more than N times, then native code is generated for it. > > In context of DBMS executor it is obvious that only frequently executed or > expensive queries have to be compiled. > So we can use estimated plan cost and number of query executions as simple > criteria for JIT-ing the query. > May be compilation of simple queries (with small cost) should be done only > for prepared statements... > > Another question is whether it is sensible to redundantly do expensive work > (llvm compilation) in all backends. > This question refers to shared prepared statement cache. But even without > such cache, it seems to be possible to use for library name some signature > of the compiled expression and allow > to share this libraries between backends. So before starting code > generation, ExecReadyCompiledExpr can first build signature and check if > correspondent library is already present. > Also it will be easier to control space used by compiled libraries in this > case. Totally agree; these considerations are very important. I tested several queries in my application that had >30 second compile times against a one second run time,. Not being able to manage when compilation happens is making it difficult to get a sense of llvm performance in the general case. Having explain analyze print compile time and being able to prepare llvm compiled queries ought to help measurement and tuning. There may be utility here beyond large analytical queries as the ability to optimize spreads through the executor with the right trade off management. This work is very exciting...thank you. merlin
Re: JIT compiling with LLVM v9.0
On Thu, Feb 1, 2018 at 8:16 PM, Thomas Munrowrote: > On Fri, Feb 2, 2018 at 2:05 PM, Andres Freund wrote: >> On 2018-02-01 09:32:17 -0800, Jeff Davis wrote: >>> On Wed, Jan 31, 2018 at 12:03 AM, Konstantin Knizhnik >>> wrote: >>> > The same problem takes place with old versions of GCC: I have to upgrade >>> > GCC >>> > to 7.2 to make it possible to compile this code. >>> > The problem in not in compiler itself, but in libc++ headers. >>> >>> How can I get this branch to compile on ubuntu 16.04? I have llvm-5.0 >>> and gcc-5.4 installed. Do I need to compile with clang or gcc? Any >>> CXXFLAGS required? >> >> Just to understand: You're running in the issue with the header being >> included from within the extern "C" {}? Hm, I've pushed a quick fix for >> that. > > That change wasn't quite enough: to get this building against libc++ > (Clang's native stdlb) I also needed this change to llvmjit.h so that > wouldn't be included with the wrong linkage (perhaps > you can find a less ugly way): > > +#ifdef __cplusplus > +} > +#endif > #include > +#ifdef __cplusplus > +extern "C" > +{ > +#endif This did the trick -- thanks. Sitting through 20 minute computer crashing link times really brings back C++ nightmares -- if anyone else needs to compile llvm/clang as I did (I'm stuck on 3.2 with my aging mint box), I strongly encourage you to use the gold linker. Question: when watching the compilation log, I see quite a few files being compiled with both O2 and O1, for example: clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 -O1 -Wno-ignored-attributes -Wno-unknown-warning-option -Wno-ignored-optimization-argument -I../../../../src/include -D_GNU_SOURCE -I/home/mmoncure/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -flto=thin -emit-llvm -c -o nbtsort.bc nbtsort.c Is this intentional? (didn't check standard compilation, it just jumped out). merlin
Re: JIT compiling with LLVM v9.0
On Wed, Jan 31, 2018 at 1:45 PM, Robert Haaswrote: > On Wed, Jan 31, 2018 at 1:34 PM, Andres Freund wrote: >>> The first one is a problem that's not going to go away. If the >>> problem of JIT being enabled "magically" is something we're concerned >>> about, we need to figure out a good solution, not just disable the >>> feature by default. >> >> That's a fair argument, and I don't really have a good answer to it. We >> could have a jit = off/try/on, and use that to signal things? I.e. it >> can be set to try (possibly default in version + 1), and things will >> work if it's not installed, but if set to on it'll refuse to work if not >> enabled. Similar to how huge pages work now. > > We could do that, but I'd be more inclined just to let JIT be > magically enabled. In general, if a user could do 'yum install ip4r' > (for example) and have that Just Work without any further database > configuration, I think a lot of people would consider that to be a > huge improvement. Unfortunately we can't really do that for various > reasons, the biggest of which is that there's no way for installing an > OS package to modify the internal state of a database that may not > even be running at the time. But as a general principle, I think > having to configure both the OS and the DB is an anti-feature, and > that if installing an extra package is sufficient to get the > new-and-improved behavior, users will like it. Bonus points if it > doesn't require a server restart. You bet. It'd be helpful to have some obvious, well advertised ways to determine when it's enabled and when it isn't, and to have a straightforward process to determine what to fix when it's not enabled and the user thinks it ought to be though. merlin
Re: procedures and plpgsql PERFORM
On Thu, Dec 14, 2017 at 11:56 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > > > 2017-12-14 18:33 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>: >> >> On Thu, Dec 14, 2017 at 10:46 AM, Pavel Stehule <pavel.steh...@gmail.com> >> wrote: >> > >> > >> > 2017-12-14 17:10 GMT+01:00 David G. Johnston >> > <david.g.johns...@gmail.com>: >> >> >> >> On Thu, Dec 14, 2017 at 8:22 AM, Merlin Moncure <mmonc...@gmail.com> >> >> wrote: >> >>> >> >>> On Thu, Dec 14, 2017 at 8:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> >>> > Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes: >> >>> >> We allow a function to be invoked as part of PERFORM statement in >> >>> >> plpgsql >> >>> >> ... >> >>> >> But we do not allow a procedure to be invoked this way >> >>> > >> >>> >> Procedures fit that category and like functions, I think, we should >> >>> >> allow them be invoked directly without any quoting and CALL >> >>> >> decoration. >> >>> > >> >>> > How is that going to work? What if the procedure tries to commit >> >>> > the >> >>> > current transaction? >> >>> > >> >>> > IOW, this is not merely a syntactic-sugar question. >> >>> >> >>> BTW, We've already come to (near-but good enough) consensus that >> >>> PERFORM syntax is really just unnecessary, and I submitted a patch to >> >>> make it optional (which I really need to dust off and complete). >> >> >> >> >> >> Except right now PERFORM doesn't exist in SQL and is a pl/pgsql keyword >> >> to >> >> specify a specific limited form of the SQL SELECT command. CALL is an >> >> SQL >> >> command. I don't see any real upside to allowing pl/pgsql to accept >> >> omission of the command tag while SQL cannot - at least not without a >> >> use-case describe why such syntax would be beneficial. And likely >> >> those use >> >> cases would revolve around some looping variant as opposed to a single >> >> stand-alone, result-less, CALL. >> >> >> >> If we do keep "PERFORM" in the pl/pgsql vocab I'd consider the >> >> following >> >> enhancement: >> >> PERFORM func() => SELECT func() >> >> PERFORM proc() => CALL proc() >> > >> > >> > I don't like this idea - functions are not procedures - can be nice if >> > it >> > will be visible. >> >> We need to get rid of PERFORM ASAP. Agree that we need to not obfuscate >> CALL. > > If we have a procedures, then functions without returned values lost a sense > - and I don't see any changes with PERFORM necessary. I don't think the presence of procedures really changes the thinking here. Having to simulate procedures with void returning functions wasn't really the point; it's an annoying syntax departure from SQL for little benefit other than assuming the users are wrong when they are not explicitly capturing the result.. the topic was heavily discussed: https://www.postgresql.org/message-id/CAHyXU0zYbeT-FzuonaaycbS9Wd8d5JO%2B_niAygzYtv5FMdx4rg%40mail.gmail.com merlin
Re: procedures and plpgsql PERFORM
On Thu, Dec 14, 2017 at 10:46 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > > > 2017-12-14 17:10 GMT+01:00 David G. Johnston <david.g.johns...@gmail.com>: >> >> On Thu, Dec 14, 2017 at 8:22 AM, Merlin Moncure <mmonc...@gmail.com> >> wrote: >>> >>> On Thu, Dec 14, 2017 at 8:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> > Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes: >>> >> We allow a function to be invoked as part of PERFORM statement in >>> >> plpgsql >>> >> ... >>> >> But we do not allow a procedure to be invoked this way >>> > >>> >> Procedures fit that category and like functions, I think, we should >>> >> allow them be invoked directly without any quoting and CALL >>> >> decoration. >>> > >>> > How is that going to work? What if the procedure tries to commit the >>> > current transaction? >>> > >>> > IOW, this is not merely a syntactic-sugar question. >>> >>> BTW, We've already come to (near-but good enough) consensus that >>> PERFORM syntax is really just unnecessary, and I submitted a patch to >>> make it optional (which I really need to dust off and complete). >> >> >> Except right now PERFORM doesn't exist in SQL and is a pl/pgsql keyword to >> specify a specific limited form of the SQL SELECT command. CALL is an SQL >> command. I don't see any real upside to allowing pl/pgsql to accept >> omission of the command tag while SQL cannot - at least not without a >> use-case describe why such syntax would be beneficial. And likely those use >> cases would revolve around some looping variant as opposed to a single >> stand-alone, result-less, CALL. >> >> If we do keep "PERFORM" in the pl/pgsql vocab I'd consider the following >> enhancement: >> PERFORM func() => SELECT func() >> PERFORM proc() => CALL proc() > > > I don't like this idea - functions are not procedures - can be nice if it > will be visible. We need to get rid of PERFORM ASAP. Agree that we need to not obfuscate CALL. merlin
Re: procedures and plpgsql PERFORM
On Thursday, December 14, 2017, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > Hi, > We allow a function to be invoked as part of PERFORM statement in plpgsql > do $$ > begin perform pg_relation_size('t1'); end; $$ language plpgsql; > DO > > But we do not allow a procedure to be invoked this way > create procedure dummy_proc(a int) as $$ > begin null; end; > $$ language plpgsql; > CREATE PROCEDURE > > do $$ > begin perform dummy_proc(1); end; $$ language plpgsql; > ERROR: dummy_proc(integer) is a procedure > LINE 1: SELECT dummy_proc(1) >^ > HINT: To call a procedure, use CALL. > QUERY: SELECT dummy_proc(1) > CONTEXT: PL/pgSQL function inline_code_block line 2 at PERFORM > > The documentation of PERFORM [1] says > "For any SQL command that does not return rows, for example INSERT > without a RETURNING clause, you can execute the command within a > PL/pgSQL function just by writing the command." > > Procedures fit that category and like functions, I think, we should > allow them be invoked directly without any quoting and CALL > decoration. > I disagree. The SQL command is 'CALL'. The documentation is really only clarifying when PERFORM is explicitly required. merlin
Re: [HACKERS] Transaction control in procedures
On Wed, Dec 6, 2017 at 8:41 AM, Peter Eisentrautwrote: > On 12/5/17 13:33, Robert Haas wrote: >> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut >> wrote: >>> I think ROLLBACK in a cursor loop might not make sense, because the >>> cursor query itself could have side effects, so a rollback would have to >>> roll back the entire loop. That might need more refined analysis before >>> it could be allowed. >> >> COMMIT really has the same problem; if the cursor query has side >> effects, you can't commit those side effects piecemeal as the loop >> executed and have things behave sanely. > > The first COMMIT inside the loop would commit the cursor query. This > isn't all that different from what you'd get now if you coded this > manually using holdable cursors or just plain client code. Clearly, you > can create a mess if the loop body interacts with the loop expression, > but that's already the case. > > But if you coded something like this yourself now and ran a ROLLBACK > inside the loop, the holdable cursor would disappear (unless previously > committed), so you couldn't proceed with the loop. > > The SQL standard for persistent stored modules explicitly prohibits > COMMIT and ROLLBACK in cursor loop bodies. But I think people will > eventually want it. The may want it, but silently promoting all cursors to held ones is not the way to give it to them, unless we narrow it down the the 'for-loop derived cursor' only. Even then we should consider syntax decoration: FOR x IN SELECT WITH HOLD LOOP END LOOP; merlin
Re: feature request: consume asynchronous notification via a function
On Tue, Nov 21, 2017 at 2:16 PM, Merlin Moncure <mmonc...@gmail.com> wrote: > On Tue, Nov 21, 2017 at 12:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Robert Haas <robertmh...@gmail.com> writes: >>> On Tue, Nov 21, 2017 at 11:32 AM, Merlin Moncure <mmonc...@gmail.com> wrote: >>>> I am very much looking at the new stored procedure functionality and >>>> imaging a loop like this: >>>> >>>> LOOP >>>> FOR r IN SELECT * FROM pg_get_notifications(30) >>>> LOOP >>>> PERFORM do_stuff(r); >>>> END LOOP; >>>> COMMIT; -- advance xmin etc >>>> END LOOP; >> >>> Yeah, if you keep the timeout fairly short, it would probably work OK >>> (with Peter's stuff). >> >> Traditionally, NOTIFY messages are delivered to the client only between >> transactions, so that there is no question about whether the >> message-delivery should roll back if the surrounding transaction aborts. >> It's not very clear to me what the behavior of pg_get_notifications() >> inside a transaction ought to be. Is it OK if it's a volatile function >> and the messages are just gone once the function has returned them, >> even if you fail to do anything about them because your transaction >> fails later? > > I think destroying upon consumption is OK. There are a lot of > mitigation strategies to deal with that issue and NOTIFY is for > signalling, not queuing. > >> (I'd be against having a function that returns more than one at a time, >> in any case, as that just complicates matters even more.) Hm, a less controversial approach might be to only allow consumption of notifications that were delivered at the start of transaction. Procedures could then issue an intervening 'COMMIT' statement to pick up new notifications. There's be no reason for a timeout argument in that case obviously, so the end user would have to poll in order to pick up the notification, which I don't like. This would be an alternative approach to the way it do it today, which is to poll for a set table flag in a non-serializable transaction, maybe with enough differentiation in use to merit introduction. merlin
Re: feature request: consume asynchronous notification via a function
On Tue, Nov 21, 2017 at 12:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Nov 21, 2017 at 11:32 AM, Merlin Moncure <mmonc...@gmail.com> wrote: >>> I am very much looking at the new stored procedure functionality and >>> imaging a loop like this: >>> >>> LOOP >>> FOR r IN SELECT * FROM pg_get_notifications(30) >>> LOOP >>> PERFORM do_stuff(r); >>> END LOOP; >>> COMMIT; -- advance xmin etc >>> END LOOP; > >> Yeah, if you keep the timeout fairly short, it would probably work OK >> (with Peter's stuff). > > Traditionally, NOTIFY messages are delivered to the client only between > transactions, so that there is no question about whether the > message-delivery should roll back if the surrounding transaction aborts. > It's not very clear to me what the behavior of pg_get_notifications() > inside a transaction ought to be. Is it OK if it's a volatile function > and the messages are just gone once the function has returned them, > even if you fail to do anything about them because your transaction > fails later? I think destroying upon consumption is OK. There are a lot of mitigation strategies to deal with that issue and NOTIFY is for signalling, not queuing. > (I'd be against having a function that returns more than one at a time, > in any case, as that just complicates matters even more.) ok. merlin
feature request: consume asynchronous notification via a function
Hackers, Currently the only way that I know of to consume async notifications via SQL (as opposed to a client application) is via dblink_get_notify. This method isn't very good; it requires some extra support coding, eats a connection and a backend, and doesn't have any timeout facilities. The lack a good facility to do this will become more troublesome if/when Peter's recent fantastic work to implement stored procedures in the database gets accepted; asynchronous notifications could be a more efficient mechanic for backend processes to signal each other than the current method of signalling via fields in a table. A good interface might look something like: pg_get_notifications( TimeOut INT DEFAULT 0, notify_name OUT TEXT, payload OUT TEXT, pid OUT INT) RETURNS SETF RECORD AS... The function would return immediately by default, or until TimeOut seconds transpired. We'd still have to poll internally, so that signals could be checked etc, but this would be a nice way to consume notifications without any dependencies -- what do you think? merlin
Re: [HACKERS] Transaction control in procedures
On Thu, Nov 16, 2017 at 5:35 PM, Simon Riggswrote: > On 14 November 2017 at 13:09, Peter Eisentraut > wrote: > >>> *) Will pg_cancel_backend() cancel the currently executing statement >>> or the procedure? (I guess probably the procedure but I'm curious) >> >> Same as the way it currently works. It will raise an exception, which >> will travel up the stack and eventually issue an error or be caught. If >> someone knows more specific concerns here I could look into it, but I >> don't see any problem. >> >>> *) Will long running procedures be subject to statement timeout (and >>> does it apply to the entire procedure)? >> >> See previous item. >> >>> Will they be able to control >>> statement_timeout from within the procedure itself? >> >> The statement timeout alarm is set by the top-level execution loop, so >> you can't change a statement timeout that is already in progress. But >> you could change the GUC and commit it for the next top-level statement. >> >>> *) Will pg_stat_activity show the invoking CALL or the currently >>> executing statement? I see a strong argument for showing both of >>> these things. although I understand that's out of scope here. >> >> Not different from a function execution, i.e., top-level statement. > > Which is the "top-level statement"? The CALL or the currently > executing statement within the proc? I think you mean former. > > For the first two answers above the answer was "currently executing > statement", yet the third answer seems to be the procedure. So that is > a slight discrepancy. > > ISTM we would like > > 1) a way to cancel execution of a procedure > 2) a way to set a timeout to cancel execution of a procedure > > as well as > > 1) a way to cancel execution of a statement that is running within a procedure > 2) a way to set a timeout to cancel execution of a statement in a procedure > > Visibility of what a routine is currently executing is the role of a > debugger utility/API. How could you cancel a statement but not the procedure itself? Cancelling (either by timeout or administrative) type errors untrappable by design for very good reasons and untrapped errors ought to return the database all the way to 'ready for query'. merlin
Re: pspg - psql pager
On Thu, Nov 16, 2017 at 11:04 AM, Pavel Stehulewrote: > 2017-11-16 16:40 GMT+01:00 Aaron W. Swenson : >> I likes it. I likes it a lot. >> >> And, you’ve got a PR on GitHub from me. >> >> It’s available on Gentoo now! This would be great package for inclusion in pgdg packaging as well. Dependencies are minor and compile risk is low. merlin
Re: Transaction control in procedures
On Thu, Nov 16, 2017 at 12:36 PM, Peter Eisentrautwrote: > On 11/16/17 07:04, legrand legrand wrote: >> We are just opening the "close cursors on/at commit" specification ;o) >> >> - MS SQL server: cursor_close_on_commit >> - Firebird: close_cursors_at_commit >> - DB2: "with hold" syntax >> - ... >> >> I think it a plus to support keeping opened cursors at commit time, >> but impacts have to be checked in details ... > > I think the facilities to support this in PostgreSQL are already there. > We'd just have to tweak PL/pgSQL to make some of its internal portals > "held" and then clean them up manually at some later point. So I think > this is a localized detail, not a fundamental problem. Automatically persisting cursors (WITH HOLD) can have some very surprising performance considerations, except when the current code execution depends on that particular cursor, in which case the current behavior of raising a (hopefully better worded-) error seems appropriate. Cursors based on temporary tables could be exempt from having to be closed or checked on COMMIT. plpgsql does not have the facility to create held cursors FWICT...automatic promotion seems pretty dubious. It could certainly be added, and cursors so held could be exempt from being force closed/errored as well. In lieu of that, having users materialize data in to temp tables for such cases seems reasonable. merlin
Re: [HACKERS] Transaction control in procedures
On Wed, Nov 15, 2017 at 3:42 PM, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 11/15/17 09:54, Merlin Moncure wrote: >> ... I noticed that: >> *) now() did not advance with commit and, >> *) xact_start via pg_stat_activity did not advance >> >> Shouldn't both of those advance with the in-loop COMMIT? > > I think you are correct. I'll include that in the next patch version. > It shouldn't be difficult. Thanks. A couple of more things. *) This error message is incorrect now: postgres=# CREATE OR REPLACE PROCEDURE foo() AS $$ BEGIN LOOP SAVEPOINT x; END LOOP; END; $$ LANGUAGE PLPGSQL; CREATE PROCEDURE Time: 0.912 ms postgres=# call foo(); ERROR: cannot begin/end transactions in PL/pgSQL HINT: Use a BEGIN block with an EXCEPTION clause instead. CONTEXT: PL/pgSQL function foo() line 5 at SQL statement I guess there are a few places that assume pl/pgsql is always run from a in-transaction function. *) Exception handlers seem to override COMMITs. The the following procedure will not insert any rows. I wonder if this is the correct behavior. I think there's a pretty good case to be made to raise an error if a COMMIT is issued if you're in an exception block. CREATE OR REPLACE PROCEDURE foo() AS $$ BEGIN LOOP INSERT INTO foo DEFAULT VALUES; COMMIT; RAISE EXCEPTION 'test'; END LOOP; EXCEPTION WHEN OTHERS THEN RAISE NOTICE '%', SQLERRM; END; $$ LANGUAGE PLPGSQL; *) The documentation could use some work. Would you like some help? merlin
Re: [HACKERS] Transaction control in procedures
On Wed, Nov 15, 2017 at 7:38 AM, Merlin Moncure <mmonc...@gmail.com> wrote: > On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut >>> Can we zero in on this? The question implied, 'can you do this >>> without being in a transaction'? PERFORM do_stuff() is a implicit >>> transaction, so it ought to end when the function returns right? >>> Meaning, assuming I was not already in a transaction when hitting this >>> block, I would not be subject to an endless transaction duration? >> >> In the server, you are always in a transaction, so that's not how this >> works. I think this also ties into my first response above. > > I'll try this out myself, but as long as we can have a *bounded* > transaction lifetime (basically the time to do stuff + 1 second) via > something like: > LOOP > > COMMIT; > PERFORM pg_sleep(1); > END LOOP; > > ... I'm good. I'll try your patch out ASAP. Thanks for answering all > my questions. Trying this out (v2 both patches, compiled clean, thank you!), postgres=# CREATE OR REPLACE PROCEDURE foo() AS $$ BEGIN LOOP PERFORM 1; COMMIT; RAISE NOTICE '%', now(); PERFORM pg_sleep(1); END LOOP; END; $$ LANGUAGE PLPGSQL; CREATE PROCEDURE Time: 0.996 ms postgres=# call foo(); NOTICE: 2017-11-15 08:52:08.936025-06 NOTICE: 2017-11-15 08:52:08.936025-06 ... I noticed that: *) now() did not advance with commit and, *) xact_start via pg_stat_activity did not advance Shouldn't both of those advance with the in-loop COMMIT? merlin
Re: pspg - psql pager
On Wed, Nov 15, 2017 at 4:45 AM, Andreas Joseph Kroghwrote: > På onsdag 15. november 2017 kl. 10:41:45, skrev Pavel Stehule > : > > Hi > > I finished new pager, that can be interesting for postgresql expert users. > Thanks for making this, "-X --no-mouse" made my day. This was my first bugfix/suggestion :-D. I use psql all day, every day. PSPG IS AWESOME, IF YOU USE PSQL GET IT AND DON'T LOOK BACK. There are no major issues with it; it's wonderful. I do not advise a global definition of PAGER...it will break other utilities (like man). I find the best way to invoke the functionality is via bash: alias psql='PAGER="/home/mmoncure/pspg -s0 -X --no-mouse" psql' Thank you Pavel. merlin
Re: [HACKERS] Transaction control in procedures
On Wed, Nov 8, 2017 at 5:48 PM, Simon Riggswrote: > On 31 October 2017 at 15:38, Peter Eisentraut > wrote: >> Here is a patch that implements transaction control in PL/Python >> procedures. (This patch goes on top of "SQL procedures" patch v1.) > > The patch is incredibly short for such a feature, which is probably a > good indication that it is feasible. > > Amazing! I have to agree with that. I'm really excited about this... Some questions: *) Will it be possible to do operations like this in pl/pgsql? BEGIN SELECT INTO r * FROM foo; START TRANSACTION; -- perhaps we ought to have a special function for this instead (BEGIN is reserved, etc). SET transaction_isololation TO serializable; ... *) Will there be any negative consequences to a procedure running with an unbounded run time? For example, something like: LOOP SELECT check_for_stuff_to_do(); IF stuff_to_do THEN do_stuff(); ELSE PERFORM pg_sleep(1); END IF; END LOOP; *) Will pg_cancel_backend() cancel the currently executing statement or the procedure? (I guess probably the procedure but I'm curious) *) Will long running procedures be subject to statement timeout (and does it apply to the entire procedure)? Will they be able to control statement_timeout from within the procedure itself? *) Will pg_stat_activity show the invoking CALL or the currently executing statement? I see a strong argument for showing both of these things. although I understand that's out of scope here. If these questions (especially the first two) come down the correct way, then it will mean that I can stop coding in other languages (primarily bash) for a fairly large number of cases that I really think belong in the database itself. This would really simplify coding, some things in bash are really awkward to get right such as a mutex to guarantee single script invocation. My only real dependency on the operation system environment at that point would be cron to step in to the backround daemon process (which would immediately set an advisory lock). I'm somewhat surprised that SPI is the point of attack for this functionality, but if it works that's really the best case scenario (the only downside I can see is that the various out of core pl/s have to implement the interface individually). merlin