Re: [HACKERS] Separate connection handling from backends
On 7 December 2016 at 22:27, Kevin Grittner wrote: > I don't know how that execution model would compare to what we use > now in terms of performance, but its popularity makes it hard to > ignore as something to consider. Those engines also tend to be threaded. They can stash state in memory and hand it around between executors in ways we cannot really do. I'd love to see a full separation of executor from session in postgres, but I can't see how it could be at all practical. The use of globals for state and the assumption that session == backend is baked in way too deep. At least, I think it'd be a slow and difficult thing to change, and would need many steps. Something like what was proposed upthread would possibly make sense as a first step. But again, I don't see anyone who's likely to actually do it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On Wed, Dec 7, 2016 at 12:36 AM, Jim Nasby wrote: > The way I'm picturing it backends would no longer be directly > tied to connections. The code that directly handles connections > would grab an available backend when a statement actually came in > (and certainly it'd need to worry about transactions and session > GUCs). If we're going to consider that, I think we should consider going all the way to the technique used by many (most?) database products, which is to have a configurable number of "engines" that pull work requests from queues. We might have one queue for disk writes, one for disk reads, one for network writes, etc. Traditionally, each engine spins over attempts to read from the queues until it finds a request to process; blocking only if several passes over all queues come up empty. It is often possible to bind each engine to a particular core. Current process-local state would be passed around, attached to queued requests, in a structure associated with the connection. I don't know how that execution model would compare to what we use now in terms of performance, but its popularity makes it hard to ignore as something to consider. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 12/6/16 10:34 PM, Craig Ringer wrote: In other words, we could start to separate session state from executor state in a limited manner. That'd definitely be valuable, IMO; it's a real shame that Pg's architecture so closely couples the two. So - is just doing "PgInCoreBouncer" a good idea? No, I don't think so. But there are potentially good things to be done in the area. Right. What I don't see here is a patch, or a vague proposal for a patch, so I'm not sure how this can go past the hot-air stage. Yeah, I brought it up because I think there's potential tie-in with other things that have been discussed (notably async transactions, but maybe BG workers and parallel query could benefit too). Maybe it would make sense as part of one of those efforts. Though, this is something that's asked about often enough that it'd probably be possible to round up a few companies to fund it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 12/6/16 6:19 PM, Tom Lane wrote: I'm kind of mystified how a simple code restructuring could solve the fundamental problems with a large number of backends. It sounds like what you're describing would just push the problem around, you would end up with some other maximum instead, max_backends, or max_active_backends, or something like that with the same problems. What it sounds like to me is building a connection pooler into the backend. I'm not really convinced we ought to go there. The way I'm picturing it backends would no longer be directly tied to connections. The code that directly handles connections would grab an available backend when a statement actually came in (and certainly it'd need to worry about transactions and session GUCs). So in a way it's like a pooler, except it'd be able to do things that poolers simply can't (like safely switch the user the backend is using). I think there might be other uses as well, since there's several other places where we need something that's kind-of like a backend, but if Heikki's work radically shifts the expense of running many thousands of backends then it's probably not worth doing. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 7 December 2016 at 10:19, Tom Lane wrote: > What it sounds like to me is building a connection pooler into the > backend. I'm not really convinced we ought to go there. If we do, it probably needs to be able to offer things that out-of-tree ones can't. The main things I see that you can't do sensibly with an out-of-tree pooler are: * Re-use a backend for different session users. You can SET SESSION AUTHORIZATION, but once you hand the connection off to the client they can just do it again or RESET SESSION AUTHORIZATION and whammo, they're a superuser. Same issue applies for SET ROLE and RESET ROLE. * Cope with session-level state when transaction pooling. We probably can't do anything much about WITH HOLD cursors, advisory locks, etc, but we could save and restore GUC state and a few other things, and we could detect whether or not we can save and restore state so we could switch transparently between session and transaction pooling. * Know, conclusively, whether a query is safe to reroute to a read-only standby, without hard coded lists of allowed functions, iffy SQL parsers, etc. Or conversely, transparently re-route queries from standbys to a read/write master. In other words, we could start to separate session state from executor state in a limited manner. That'd definitely be valuable, IMO; it's a real shame that Pg's architecture so closely couples the two. So - is just doing "PgInCoreBouncer" a good idea? No, I don't think so. But there are potentially good things to be done in the area. What I don't see here is a patch, or a vague proposal for a patch, so I'm not sure how this can go past the hot-air stage. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 12/6/16 1:46 PM, Adam Brusselback wrote: BTW, it just occurred to me that having this separation would make it relatively easy to support re-directing DML queries from a replica to the master; if the backend throws the error indicating you tried to write data, the connection layer could re-route that. This also sounds like it would potentially allow re-routing the other way where you know the replica contains up-to-date data, couldn't you potentially re-direct read only queries to your replicas? That's a lot more complicated, so I don't see that happening anytime soon. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
Greg Stark writes: > On 5 December 2016 at 19:48, Jim Nasby wrote: >> One solution to this would be to segregate connection handling from actual >> backends, somewhere along the lines of separating the main loop from the >> switch() that handles libpq commands. Benefits: > I'm kind of mystified how a simple code restructuring could solve the > fundamental problems with a large number of backends. It sounds like > what you're describing would just push the problem around, you would > end up with some other maximum instead, max_backends, or > max_active_backends, or something like that with the same problems. What it sounds like to me is building a connection pooler into the backend. I'm not really convinced we ought to go there. > Heikki's work with CSN would actually address the main fundamental > problem. Instead of having to scan PGPROC when taking a snapshot > taking a snapshot would be O(1). While that would certainly improve matters, I suspect there are still going to be bottlenecks arising from too many backends. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 5 December 2016 at 19:48, Jim Nasby wrote: > One solution to this would be to segregate connection handling from actual > backends, somewhere along the lines of separating the main loop from the > switch() that handles libpq commands. Benefits: I'm kind of mystified how a simple code restructuring could solve the fundamental problems with a large number of backends. It sounds like what you're describing would just push the problem around, you would end up with some other maximum instead, max_backends, or max_active_backends, or something like that with the same problems. At best it would help people who have connection pooling or but few connections active at any given time. Heikki's work with CSN would actually address the main fundamental problem. Instead of having to scan PGPROC when taking a snapshot taking a snapshot would be O(1). There might need to be scans of the list of active transactions but never of all connections whether they're in a transaction or not. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
> > BTW, it just occurred to me that having this separation would make it > relatively easy to support re-directing DML queries from a replica to the > master; if the backend throws the error indicating you tried to write data, > the connection layer could re-route that. This also sounds like it would potentially allow re-routing the other way where you know the replica contains up-to-date data, couldn't you potentially re-direct read only queries to your replicas?
Re: [HACKERS] Separate connection handling from backends
On Mon, Dec 5, 2016 at 6:54 PM, Jim Nasby wrote: > On 12/5/16 2:14 PM, David Fetter wrote: >> What do you see as the relationship between this proposal and the >> earlier one for admission control? >> >> https://www.postgresql.org/message-id/4b38c1c502250002d...@gw.wicourts.gov > > Without having read the paper reference in that email or the rest of the > thread... > One big difference from what Kevin describe though: I don't think it makes > sense for the connection layer to be able to parse queries. I suspect it > would take a very large amount of work to allow something that's not a > full-blown backend to parse, because it needs access to the catalogs. > *Maybe* it'd be possible if we used a method other than ProcArray to > register the snapshot that required, but you'd still have to duplicate all > the relcache stuff. I don't recall ever, on the referenced thread or any other, suggesting what you describe. Basically, I was suggesting that we create a number hooks which an admission control policy (ACP) could tie into, and we could create pluggable APCs. One ACP that I think would be useful would be one that ties into a hook placed at the point(s) where a transaction is attempting to acquire its first "contentious resource" -- which would include at least snapshot and locks. If the user was a superuser it would allow the transaction to proceed; otherwise it would check whether the number of transactions which were holding contentious resources had reached some (configurable) limit. If allowing the transaction to proceed would put it over the limit, the transaction would be blocked and put on a queue behind any other transactions which had already been blocked for this reason, and a transaction from the queue would be unblocked whenever the count of transactions holding contentious resources fell below the threshold. I don't see where parsing even enters into this. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 12/5/16 2:14 PM, David Fetter wrote: One solution to this would be to segregate connection handling from actual backends, somewhere along the lines of separating the main loop from the switch() that handles libpq commands. Benefits: [interesting stuff elided] What do you see as the relationship between this proposal and the earlier one for admission control? https://www.postgresql.org/message-id/4b38c1c502250002d...@gw.wicourts.gov Without having read the paper reference in that email or the rest of the thread... I think my proposal would completely eliminate the need for what Kevin proposed as long as the "connection" layer released the backend that it was using as soon as possible (namely, as soon as the backend was no longer in a transaction). This does assume that the connection layer is keeping a copy of all user/session settable GUCs. I don't think we need that ability in the first pass, but it would be very high on the desired feature list (because it would allow "transaction-level" pooling). Actually, we could potentially do one better... if a backend sat idle in transaction for long enough, we could "save" that transaction state and free up the backend to do something else. I'm thinking this would be similar to a prepared transaction, but presumably there'd be some differences to allow for picking the transaction back up. One big difference from what Kevin describe though: I don't think it makes sense for the connection layer to be able to parse queries. I suspect it would take a very large amount of work to allow something that's not a full-blown backend to parse, because it needs access to the catalogs. *Maybe* it'd be possible if we used a method other than ProcArray to register the snapshot that required, but you'd still have to duplicate all the relcache stuff. BTW, it just occurred to me that having this separation would make it relatively easy to support re-directing DML queries from a replica to the master; if the backend throws the error indicating you tried to write data, the connection layer could re-route that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On Mon, Dec 05, 2016 at 01:48:03PM -0600, Jim Nasby wrote: > max_connections is a frequent point of contention between users and > developers. Users want to set it high so they don't have to deal with Yet > More Software (pgpool or pgBouncer); PG developers freak out because > backends are pretty heavyweight, there's some very hot code that's sensitive > to the size of ProcArray, lock contention, etc. > > One solution to this would be to segregate connection handling from actual > backends, somewhere along the lines of separating the main loop from the > switch() that handles libpq commands. Benefits: [interesting stuff elided] What do you see as the relationship between this proposal and the earlier one for admission control? https://www.postgresql.org/message-id/4b38c1c502250002d...@gw.wicourts.gov Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers