Thank you, Alvaro, Andres, Magnus, Peter, and Tom for your thoughts! I
consolidated all the responses, and provided the high level overview of the
Azure PostgreSQL database architecture, and provided additional details at
the bottom of the message.
> Tom Lane: We don't actually have any concept of a server name at the
> moment, and it isn't very clear what introducing that concept would buy.
> Please explain.
> Tom Lane: I think for instance you could have one pgbouncer instance (or
> whatever pooler) pointing >to several different servers. So the client
> connects to the pooler and indicates which of the servers to >connect to.
> Magnus: The normal one to use for pgbonucer today is, well, "database
> name". You can then have >pgbouncer map different databases to different
> backend servers. It's fairly common in my experience >to have things like
> "dbname" and "dbname-ro" (for example) as different
> database names with one mapping to the master and one mapping to a
> load-balanced set of standbys, >and things like that. ISTM that using the
> database name is a good choice for that.
PgBouncer for example assumes that the database names are unique across the
database clusters it is serving. Our front-end Gateways can serve tens of
thousands of Postgres servers spanning multiple customers, and
organizations, and enforcing the database names being unique is not possible
for the users of the service.
> Magnus: For the original idea in this thread, using something like
> dbname@server seems a more logical choice than username@server.
We considered this option but connecting to the database from the GUI tools
is not very intuitive / possible. Also /c option now requires including full
cluster_name every time user connect.
> Tome Lane: I should think that in such cases, the end client is exactly
> not what you want to be choosing which server it gets redirected to.
> You'd be wanting to base that on policies defined at the pooler. There
> already plenty of client-supplied attributes you could use as inputs for
> such policies (user name and application name, for instance). Why do we
> need to incur a protocol break to add another one?
This is optional and is not a protocol break. This doesn’t make the cluster
name field mandatory in the startup message. If the client specifies the
extra parameter in the connection string to include the server name in the
startup message then only it will be included otherwise it is not. In a
proxy scenario, end clients startup message doesn’t need to include the
server name in it, and for proxy it is optional to include this field while
sending the startup message to the server. It is preferred to set the field
for the Azure PostgreSQL service instead of appending the cluster name to
the user name.
Yes, there are other fields like application name, but it is not easy to use
them from GUI tools like Pg Admin. Overloading a field is also not always
intuitive to the users, and some of the applications potentially using them
for different purposes. Default database name is the user name for some of
the clients, and as we are overloading user name today and the startup
message has user@clustername in it. This behavior causing logins to fail
with invalid database name as the database doesn’t exist on the server.
Using database name may not be ideal because GUI tools doesn’t assume
database has server name in it.
> Peter: I think this could be useful if it's something like what HTTP uses.
The proposal is similar to http host header field in HTTP1.1. This allows
the origin server or gateway to differentiate between internally-ambiguous
URLs, such as the root "/" URL of a server for multiple host names on a
single IP address. For reference,
database for PostgreSQL follows the similar pattern where all the database
cluster dns records points to our gateways nodes, and we would like to
resolve them based on the server name field.
High level overview of Azure Database For PostgreSQL service.
Azure database for PostgreSQL is a cloud service that hosts several
PostgreSQL database clusters a.k.a. servers with in a region, potentially
tens of thousands of database clusters. We have several front-end proxies
(called Gateways) deployed to proxy the customer connections to the
appropriate database cluster. Each database cluster has a unique name, and a
DNS record is provisioned with that name. For example, if pgserver is the
name of the database cluster, then it has a dns record
pgserver.postgres.database.azure.com associated with it. The DNS record of a
customer database server will be pointing to the front-end Gateways, and the
customer request reaches these Gateways. Gateway requires database cluster
name to proxy the connection to the appropriate database cluster. In the
absence of this it is impossible for us to proxy the request. Startup
message containing the server name helps us route the requests to the right
database clusters, without customers overloading the user name field in the
startup message. For a friction free user experience, we are trying to make
the changes in the client code, PostgreSQL code, and minimize the changes
required in the application.
Here is the logical flow of server creation, and the connection
Database cluster provisioning workflow:
1. Customer requests Azure to create a cluster
2. Azure registers the request, and adds the necessary metadata in the
central metadata store
3. Create a database cluster with the customer requested size, and with the
provided user name and password combination
4. DNS record created for the server for the customer to connect to. This
points to our front-end gateway nodes
5. Make proxies aware of the location of the database cluster created, and
proxies caches this info, and refreshes if location changes
1. Client uses the host name provided to connect to the database cluster
provisioned in Azure, and today they include database cluster name in the
user name (username@pgserver)
2. Request reaches the Gateway (proxy)
3. Gateway parses the packet, does the lookup, and creates a proxy
connection to the right database cluster (in the lookup success path)
4. SSL handshake happens between the Postgres process, and the Gateway
5. Gateway sends the startup packet removing the hostname from the startup
6. Postgres responds with the password authentication request message
7. Gateway sends the auth request back to the client
8. Client response with the password message
9. Gateway forwards the same to the database cluster
10. Database cluster authenticates the client and sends success / failure
response to the client
LibPQ Connection string format:
host=localhost port=5432 dbname=mydb connect_timeout=10
include_cluster_name is an optional parameter and setting this true includes
cluster_name in the startup message, and will be ignored otherwise.
Connection via psql:
Today customers connect as follows:
psql -h pgserver.postgres.database.azure.com -U pgadmin@pgserver -d Postgres
After the proposed change the same command looks like:
psql -h pgserver.postgres.database.azure.com -U pgadmin -d Postgres
psql -h pgserver.postgres.database.azure.com -U pgadmin -d Postgres -N
-N / -- includeClusterName is optional here.
View this message in context:
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: