Re: Built-in connection pooling

2018-08-27 Thread Konstantin Knizhnik


New versions of built-in connection pool is attached to this mail.
Now client's startup package is received by one of listener workers and 
postmater knows database/user name of the recevied connection and so is 
able to marshal it to the proper connection pool. Right now SSL is not 
supported.


Also I provided some general mechanism for moving static variables to 
session context. File
include/storage/sessionvars.h contains list of such variables which are 
stored to session context on reschedule.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/test_decoding/sql/messages.sql b/contrib/test_decoding/sql/messages.sql
index cf3f773..14c4163 100644
--- a/contrib/test_decoding/sql/messages.sql
+++ b/contrib/test_decoding/sql/messages.sql
@@ -23,6 +23,8 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'for
 
 -- test db filtering
 \set prevdb :DBNAME
+show session_pool_size;
+show session_pool_ports;
 \c template1
 
 SELECT 'otherdb1' FROM pg_logical_emit_message(false, 'test', 'otherdb1');
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5d13e6a..5a93c7e 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -178,7 +178,6 @@ static List *overrideStack = NIL;
  * committed its creation, depending on whether myTempNamespace is valid.
  */
 static Oid	myTempNamespace = InvalidOid;
-
 static Oid	myTempToastNamespace = InvalidOid;
 
 static SubTransactionId myTempNamespaceSubID = InvalidSubTransactionId;
@@ -193,6 +192,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -460,9 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -471,9 +469,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -482,8 +478,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2921,9 +2916,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2986,9 +2979,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -3001,8 +2992,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3254,16 +3244,28 @@ int
 GetTempNamespaceBackendId(Oid namespaceId)
 {
 	int			result;
-	char	   *nspname;
+	char	   *nspname,
+			   *addlevel;
 
 	/* See if the namespace name starts with "pg_temp_" or "pg_toast_temp_" */
 	nspname = get_namespace_name(namespaceId);
 	if (!nspname)
 		return InvalidBackendId;	/* no such namespace? */
 	if (strncmp(nspname, "pg_temp_", 8) == 0)
-		result = atoi(nspname + 8);
+	{
+		/* check for session id */
+		if ((addlevel = strstr(nspname + 8, "_")) != NULL)
+			result = atoi(addlevel + 1);
+		else
+			result = atoi(nspname + 8);
+	}
 	else if (strncmp(nspname, "pg_toast_temp_", 14) == 0)
-		result = atoi(nspname + 14);
+	{
+		if ((addlevel = strstr(nspname + 14, "_")) != NULL)
+			result = atoi(addlevel + 1);
+		

Re: Built-in connection pooling

2018-05-18 Thread Robert Haas
On Thu, May 17, 2018 at 9:09 PM, Bruce Momjian  wrote:
>> However, I think that's probably worrying about the wrong end of the
>> problem first.  IMHO, what we ought to start by doing is considering
>> what a good architecture for this would be, and how to solve the
>> general problem of per-backend session state.  If we figure that out,
>> then we could worry about optimizing whatever needs optimizing, e.g.
>> memory usage.
>
> Yes, I think this matches my previous question --- if we are going to
> swap out session state to allow multiple sessions to multiplex in the
> same OS process, and that swapping has similar overhead to how the OS
> swaps processes, why not just let the OS continue doing the process
> swapping.
>
> I think we need to first find out what it is that makes high session
> counts slow.  For example, if we swap out session state, will we check
> the visibility rules for the swapped out session.  If not, and that is
> what makes swapping session state make Postgres faster, let's just find
> a way to skip checking visibility rules for inactive sessions and get
> the same benefit more simply.

I don't think we're really in agreement.  I am pretty convinced that
we can save a lot of memory and other CPU resources if we don't need a
separate process for each session.  I don't have any doubt that the
benefit is there.  My point is rather that we need an organized way to
attach the problem of saving and restoring session state, not an
ad-hoc approach for each particular kind of session state.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-05-17 Thread Bruce Momjian
On Fri, May  4, 2018 at 03:25:15PM -0400, Robert Haas wrote:
> On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
> > If we are breaking 1:1 backend:session relationship, what controls
> > would we have to manage resource consumption?
> 
> I mean, if you have a large number of sessions open, it's going to
> take more memory in any design.  If there are multiple sessions per
> backend, there may be some possibility to save memory by allocating it
> per-backend rather than per-session; it shouldn't be any worse than if
> you didn't have pooling in the first place.
> 
> However, I think that's probably worrying about the wrong end of the
> problem first.  IMHO, what we ought to start by doing is considering
> what a good architecture for this would be, and how to solve the
> general problem of per-backend session state.  If we figure that out,
> then we could worry about optimizing whatever needs optimizing, e.g.
> memory usage.

Yes, I think this matches my previous question --- if we are going to
swap out session state to allow multiple sessions to multiplex in the
same OS process, and that swapping has similar overhead to how the OS
swaps processes, why not just let the OS continue doing the process
swapping.

I think we need to first find out what it is that makes high session
counts slow.  For example, if we swap out session state, will we check
the visibility rules for the swapped out session.  If not, and that is
what makes swapping session state make Postgres faster, let's just find
a way to skip checking visibility rules for inactive sessions and get
the same benefit more simply.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-05-08 Thread Konstantin Knizhnik



On 05.05.2018 00:54, Merlin Moncure wrote:

On Fri, May 4, 2018 at 2:25 PM, Robert Haas  wrote:

On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?

I mean, if you have a large number of sessions open, it's going to
take more memory in any design.  If there are multiple sessions per
backend, there may be some possibility to save memory by allocating it
per-backend rather than per-session; it shouldn't be any worse than if
you didn't have pooling in the first place.

It is absolutely worse, or at least can be.   plpgsql plan caches can
be GUC dependent due to search_path; you might get a different plan
depending on which tables resolve into the function.  You might
rightfully regard this as an edge case but there are other 'leakages',
for example, sessions with different planner settings obviously ought
not to share backend plans.  Point being, there are many
interdependent things in the session that will make it difficult to
share some portions but not others.


Right now, in my built-in connection pool implementation there is shared 
prepared statements cache for all sessions in one backend,
but actually each session has its own set of prepared statements. I just 
append session identifier to prepared statement name to make it unique.
So there is no problem with different execution plans for different 
clients caused by specific GUC settings (like enable_seqscan or 
max_parallel_workers_per_gather).
But the primary reason for such behavior is to avoid prepared statements 
name conflicts between different clients.


From my point of view, there are very few cases when using 
client-specific plans has any sense.
In most cases, requirement is quite opposite: I want to be able to 
prepare execution plan (using missed in Postgres hints, GUCs, adjusted 
statistic,...) which will be used by all clients.
The most natural and convenient way to achieve it is to use shared plan 
cache.
But shared plan cache is a different story, not directly related with 
connection pooling.\



Point being, there are many interdependent things in the session that will make 
it difficult to share some portions but not others.


I do not see so much such things... Yes, GUCs can affect behavior within 
session. But GUCs are now supported: each session can have its own set 
of GUCs.
Prepared plans may depend on GUCs, but them are also private for each 
session now. What else?


And in any case, with external connection pooler you are not able to use 
session semantic at all: GUCs, prepared statements, temporary table, 
advisory locks,...
with built-in connection pooler you can use sessions but with some 
restrictions (lack of advisory locks, for example). It is better than 
nothing, isn't it?





--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-05-07 Thread Robert Haas
On Fri, May 4, 2018 at 5:54 PM, Merlin Moncure  wrote:
>> I mean, if you have a large number of sessions open, it's going to
>> take more memory in any design.  If there are multiple sessions per
>> backend, there may be some possibility to save memory by allocating it
>> per-backend rather than per-session; it shouldn't be any worse than if
>> you didn't have pooling in the first place.
>
> It is absolutely worse, or at least can be.   plpgsql plan caches can
> be GUC dependent due to search_path; you might get a different plan
> depending on which tables resolve into the function.  You might
> rightfully regard this as an edge case but there are other 'leakages',
> for example, sessions with different planner settings obviously ought
> not to share backend plans.  Point being, there are many
> interdependent things in the session that will make it difficult to
> share some portions but not others.

I think you may be misunderstanding my remarks.  Suppose I've got 10
real connections multiplexed across 1000 sessions.  Barring a
crazy-stupid implementation, that should never use more memory than
1000 completely separate connections.  (How could it?)  It will of
course use a lot more memory than 10 real connections handling 10
sessions, but that's to be expected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-05-07 Thread Konstantin Knizhnik



On 04.05.2018 18:22, Merlin Moncure wrote:

On Thu, May 3, 2018 at 12:01 PM, Robert Haas  wrote:

On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:

What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

I have to admit that I find that an amazing statement.  Not that
pgbouncer is bad technology, but saying that it does everything
exactly right seems like a vast overstatement.  That's like saying
that you don't want running water in your house, just a faster motor
for the bucket you use to draw water from the well.

Well you certainly have a point there; I do have a strong tendency for
overstatement :-).

Let's put it like this: being able to have connections funnel down to
a smaller number of sessions is nice feature.  Applications that are
large, complex, or super high volume have a tendency towards stateless
(with respect to the database session) architecture anyways so I tend
not to mind lack of session features when pooling (prepared statements
perhaps being the big outlier here).  It really opens up a lot of
scaling avenues.  So better a better phrased statement might be, "I
like the way pgbouncer works, in particular transaction mode pooling
from the perspective of the applications using it".  Current main pain
points are the previously mentioned administrative headaches and
better performance from a different architecture (pthreads vs libev)
would be nice.

I'm a little skeptical that we're on the right path if we are pushing
a lot of memory consumption into the session level where a session is
pinned all the way back to a client connection. plpgsql function plan
caches can be particularly hungry on memory and since sessions have
their own GUC ISTM each sessions has to have their own set of them
since plans depend on search path GUC which is session specific.
Previous discussions on managing cache memory consumption (I do dimly
recall you making a proposal on that very thing) centrally haven't
gone past panning stages AFAIK.

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?


Most of resource consumption is related with backends, not with sessions.
It is first of all catalog and relation caches. If there are thousands 
of tables in a databases, then this caches (which size is not limited 
now) can grow up to several megabytes.
Taken in account, that at modern SMP systems with hundreds of CPU core 
it may be reasonable to spawn hundreds of backends, total memory 
footprint of this caches can be very significant.
This is why I think that we should move towards shared caches... But 
this trip is not expected to be so easy.


Right now connection pooler allows to handle much more user sessions 
than there are active backends.

So it helps to partly solve this problem with resource consumption.
Session context itself is not expected to be very large: changed GUCs + 
prepared statements.


I accept your argument about stateless application architecture.
Moreover, this is more or less current state of things: most customers 
has to use pgbouncer and so have to prohibit to use in their application 
all session specific stuff.
What them are loosing in this case? Prepared statements? But there are 
really alternative solutions: autoprepare, shared plan cache,... which 
allow to use prepared statements without session context. Temporary 
tables, advisory locks,... ?


Temporary tables are actually very "ugly" thing, causing a lot of problems:
- can not be created at hot standby
- cause catalog bloating
- deallocation of large number of temporary table may acquire too much 
locks.

...
May be them somehow should be redesigned? For example, have shared 
ctalog entry for temporary table, but backend-private content... Or make 
it possible to change lifetime of temporary tables from session to 
transaction...



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure
On Fri, May 4, 2018 at 2:25 PM, Robert Haas  wrote:
> On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
>> If we are breaking 1:1 backend:session relationship, what controls
>> would we have to manage resource consumption?
>
> I mean, if you have a large number of sessions open, it's going to
> take more memory in any design.  If there are multiple sessions per
> backend, there may be some possibility to save memory by allocating it
> per-backend rather than per-session; it shouldn't be any worse than if
> you didn't have pooling in the first place.

It is absolutely worse, or at least can be.   plpgsql plan caches can
be GUC dependent due to search_path; you might get a different plan
depending on which tables resolve into the function.  You might
rightfully regard this as an edge case but there are other 'leakages',
for example, sessions with different planner settings obviously ought
not to share backend plans.  Point being, there are many
interdependent things in the session that will make it difficult to
share some portions but not others.

> However, I think that's probably worrying about the wrong end of the
> problem first.  IMHO, what we ought to start by doing is considering
> what a good architecture for this would be, and how to solve the
> general problem of per-backend session state.  If we figure that out,
> then we could worry about optimizing whatever needs optimizing, e.g.
> memory usage.

Exactly -- being able to manage down resource consumption by
controlling session count is a major feature that ought not to be
overlooked. So I'm kind of signalling that if given a choice between
that (funneling a large pool of connections down to a smaller number
of backends) and externalized shared sessions I'd rather have the
funnel; it solves a number of very important problems with respect to
server robustness.  So I'm challenging (in a friendly, curious way) if
breaking session:backend 1:1 is really a good idea.  Maybe a
connection pooler implementation can do both of those things or it's
unfair to expect an implementation to do both of them.

merlin



Re: Built-in connection pooling

2018-05-04 Thread Robert Haas
On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
> If we are breaking 1:1 backend:session relationship, what controls
> would we have to manage resource consumption?

I mean, if you have a large number of sessions open, it's going to
take more memory in any design.  If there are multiple sessions per
backend, there may be some possibility to save memory by allocating it
per-backend rather than per-session; it shouldn't be any worse than if
you didn't have pooling in the first place.

However, I think that's probably worrying about the wrong end of the
problem first.  IMHO, what we ought to start by doing is considering
what a good architecture for this would be, and how to solve the
general problem of per-backend session state.  If we figure that out,
then we could worry about optimizing whatever needs optimizing, e.g.
memory usage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure
On Thu, May 3, 2018 at 12:01 PM, Robert Haas  wrote:
> On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:
>> What _I_ (maybe not others) want is a
>> faster pgbouncer that is integrated into the database; IMO it does
>> everything exactly right.
>
> I have to admit that I find that an amazing statement.  Not that
> pgbouncer is bad technology, but saying that it does everything
> exactly right seems like a vast overstatement.  That's like saying
> that you don't want running water in your house, just a faster motor
> for the bucket you use to draw water from the well.

Well you certainly have a point there; I do have a strong tendency for
overstatement :-).

Let's put it like this: being able to have connections funnel down to
a smaller number of sessions is nice feature.  Applications that are
large, complex, or super high volume have a tendency towards stateless
(with respect to the database session) architecture anyways so I tend
not to mind lack of session features when pooling (prepared statements
perhaps being the big outlier here).  It really opens up a lot of
scaling avenues.  So better a better phrased statement might be, "I
like the way pgbouncer works, in particular transaction mode pooling
from the perspective of the applications using it".  Current main pain
points are the previously mentioned administrative headaches and
better performance from a different architecture (pthreads vs libev)
would be nice.

I'm a little skeptical that we're on the right path if we are pushing
a lot of memory consumption into the session level where a session is
pinned all the way back to a client connection. plpgsql function plan
caches can be particularly hungry on memory and since sessions have
their own GUC ISTM each sessions has to have their own set of them
since plans depend on search path GUC which is session specific.
Previous discussions on managing cache memory consumption (I do dimly
recall you making a proposal on that very thing) centrally haven't
gone past panning stages AFAIK.

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?

merlin



Re: Built-in connection pooling

2018-05-04 Thread Konstantin Knizhnik



On 03.05.2018 20:01, Robert Haas wrote:

On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:

What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

I have to admit that I find that an amazing statement.  Not that
pgbouncer is bad technology, but saying that it does everything
exactly right seems like a vast overstatement.  That's like saying
that you don't want running water in your house, just a faster motor
for the bucket you use to draw water from the well.

May be if you are engaged in agriculture at your country house, then 
having a well with good motor pump is better for watering of plants than 
water faucet at your kitchen.
But most of homeowners prefer to open a tapto wash hands rather than 
perform some complex manipulations with motor pump.


I absolutely sure that external connection poolers will always have 
their niche: them can be used as natural proxy between multiple clients 
and DBMS.

Usually HA/load balancing also can be done at this level.

But there are many cases when users just do not want to worry about  
connection pooling: them just has some number of clients (which can be 
larger enough and several times larger than optimal number of Postgres 
backends) and them want them to access database without introducing some 
intermediate layers. In this case built-in connection pooler will be the 
ideal solution.


This is from user's point of view. From Postgres developer's point of 
view, built-in pooler has  some technical advantages comparing with 
external pooler.
Some of this advantages can be eliminated by significant redesign of 
Postgres architecture, for example introducing shared cache of prepared 
statements...
But in any case, the notion of session context  and possibility to 
maintain larger number of opened sessions will always be topical.



Some update on status of built-in connection pooler prototype: I managed 
to run regression and isolation tests for pooled connections.
Right now  6 of 185 tests failed are failed for regression tests and 2 
of 67 tests failed for isolation tests.
For regression tests result may vary depending on parallel schedule, 
because of manipulations with roles/permissions which are not currently 
supported.
The best results are for sequential schedule: 5 failed tests: this 
failures caused by differences in pg_prepared_statements caused by 
"mangled" prepared names.


Failures of isolation tests are caused by unsupported advisory locks.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-05-03 Thread Robert Haas
On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:
> What _I_ (maybe not others) want is a
> faster pgbouncer that is integrated into the database; IMO it does
> everything exactly right.

I have to admit that I find that an amazing statement.  Not that
pgbouncer is bad technology, but saying that it does everything
exactly right seems like a vast overstatement.  That's like saying
that you don't want running water in your house, just a faster motor
for the bucket you use to draw water from the well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-28 Thread Konstantin Knizhnik



On 27.04.2018 23:43, Merlin Moncure wrote:

On Fri, Apr 27, 2018 at 11:44 AM, Konstantin Knizhnik
 wrote:


On 27.04.2018 18:33, Merlin Moncure wrote:

On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
 wrote:

On 27.04.2018 16:49, Merlin Moncure wrote:

I'm confused here...could be language issues or terminology (I'll look
at your latest code).  Here is how I understand things:
Backend=instance of postgres binary
Session=application state within postgres binary (temp tables,
prepared statement etc)
Connection=Client side connection

Backend is a process, forked by postmaster.

right, we are saying the same thing  here.


AIUI (I could certainly be wrong), withing connection pooling, ratio
of backend/session is still 1:1.  The idea is that client connections
when they issue SQL to the server reserve a Backend/Session, use it
for the duration of a transaction, and release it when the transaction
resolves.  So many client connections share backends.  As with
pgbouncer, the concept of session in a traditional sense is not really
defined; session state management would be handled within the
application itself, or within data within tables, but not within
backend private memory.  Does that align with your thinking?

No. Number of sessions is equal to number of client connections.
So client is not reserving "Backend/Session" as it happen in pgbouncer.
One backend keeps multiple sessions. And for each session it maintains
session context which included client's connection.
And it is backend's decision transaction of which client it is going to
execute now.
This is why built-in pooler is able to provide session semantic without
backend/session=1:1 requirement.

I see.   I'm not so sure that is a good idea in the general sense :(.
Connection sharing sessions is normal and well understood, and we have
tooling to manage that already (DISCARD).  Having the session state
abstracted out and pinned to the client connection seems complex and
wasteful, at least sometimes.  What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.
Yandex's Odyssey is faster version of pgbouncer (supporting 
multithreading and many other things).
Why do you need to integrate it in Postgres if you do not want to 
preserve session semantic? Just to minimize efforts needed to maintain 
extra components?
But in principle, pooler can be distributed as Postgres extension and is 
started as background worker.

Will it help to eliminate administration overhead of separate page pool?

In any case, my built-in pooler isoriented on application which needs 
session sementic (using temporary tables, GUCs, prepared statements,...)
As I many time mentioned, is is possible to provide it only inside 
database, not in some external pooler, doesn't matter which architecture 
it has.










merlin


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure
On Fri, Apr 27, 2018 at 11:44 AM, Konstantin Knizhnik
 wrote:
>
>
> On 27.04.2018 18:33, Merlin Moncure wrote:
>> On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
>>  wrote:
>>> On 27.04.2018 16:49, Merlin Moncure wrote:
>> I'm confused here...could be language issues or terminology (I'll look
>> at your latest code).  Here is how I understand things:
>> Backend=instance of postgres binary
>> Session=application state within postgres binary (temp tables,
>> prepared statement etc)
>> Connection=Client side connection
>
> Backend is a process, forked by postmaster.

right, we are saying the same thing  here.

>> AIUI (I could certainly be wrong), withing connection pooling, ratio
>> of backend/session is still 1:1.  The idea is that client connections
>> when they issue SQL to the server reserve a Backend/Session, use it
>> for the duration of a transaction, and release it when the transaction
>> resolves.  So many client connections share backends.  As with
>> pgbouncer, the concept of session in a traditional sense is not really
>> defined; session state management would be handled within the
>> application itself, or within data within tables, but not within
>> backend private memory.  Does that align with your thinking?
>
> No. Number of sessions is equal to number of client connections.
> So client is not reserving "Backend/Session" as it happen in pgbouncer.
> One backend keeps multiple sessions. And for each session it maintains
> session context which included client's connection.
> And it is backend's decision transaction of which client it is going to
> execute now.
> This is why built-in pooler is able to provide session semantic without
> backend/session=1:1 requirement.

I see.   I'm not so sure that is a good idea in the general sense :(.
Connection sharing sessions is normal and well understood, and we have
tooling to manage that already (DISCARD).  Having the session state
abstracted out and pinned to the client connection seems complex and
wasteful, at least sometimes.  What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

merlin



Re: Built-in connection pooling

2018-04-27 Thread Konstantin Knizhnik



On 27.04.2018 18:33, Merlin Moncure wrote:

On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
 wrote:

On 27.04.2018 16:49, Merlin Moncure wrote:

*) How are you pinning client connections to an application managed
transaction? (IMNSHO, this feature is useless without being able to do
that)

Sorry, I do not completely understand the question.
Rescheduling is now done at transaction level - it means that backand can
not be switched to other session until completing current transaction.
The main argument  for transaction level pooling is that it allows not worry
about heavy weight locks, which are associated with procarray entries.

I'm confused here...could be language issues or terminology (I'll look
at your latest code).  Here is how I understand things:
Backend=instance of postgres binary
Session=application state within postgres binary (temp tables,
prepared statement etc)
Connection=Client side connection

Backend is a process, forked by postmaster.


AIUI (I could certainly be wrong), withing connection pooling, ratio
of backend/session is still 1:1.  The idea is that client connections
when they issue SQL to the server reserve a Backend/Session, use it
for the duration of a transaction, and release it when the transaction
resolves.  So many client connections share backends.  As with
pgbouncer, the concept of session in a traditional sense is not really
defined; session state management would be handled within the
application itself, or within data within tables, but not within
backend private memory.  Does that align with your thinking?

No. Number of sessions is equal to number of client connections.
So client is not reserving "Backend/Session" as it happen in pgbouncer.
One backend keeps multiple sessions. And for each session it maintains 
session context which included client's connection.
And it is backend's decision transaction of which client it is going to 
execute now.
This is why built-in pooler is able to provide session semantic without 
backend/session=1:1 requirement.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure
On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
 wrote:
> On 27.04.2018 16:49, Merlin Moncure wrote:
>> *) How are you pinning client connections to an application managed
>> transaction? (IMNSHO, this feature is useless without being able to do
>> that)
>
> Sorry, I do not completely understand the question.
> Rescheduling is now done at transaction level - it means that backand can
> not be switched to other session until completing current transaction.
> The main argument  for transaction level pooling is that it allows not worry
> about heavy weight locks, which are associated with procarray entries.

I'm confused here...could be language issues or terminology (I'll look
at your latest code).  Here is how I understand things:
Backend=instance of postgres binary
Session=application state within postgres binary (temp tables,
prepared statement etc)
Connection=Client side connection

AIUI (I could certainly be wrong), withing connection pooling, ratio
of backend/session is still 1:1.  The idea is that client connections
when they issue SQL to the server reserve a Backend/Session, use it
for the duration of a transaction, and release it when the transaction
resolves.  So many client connections share backends.  As with
pgbouncer, the concept of session in a traditional sense is not really
defined; session state management would be handled within the
application itself, or within data within tables, but not within
backend private memory.  Does that align with your thinking?

merlin



Re: Built-in connection pooling

2018-04-27 Thread Konstantin Knizhnik



On 27.04.2018 16:49, Merlin Moncure wrote:

On Thu, Apr 26, 2018 at 6:04 AM, Konstantin Knizhnik
 wrote:

On 25.04.2018 20:02, Merlin Moncure wrote:

Yep.  The main workaround today is to disable them.  Having said that,
it's not that difficult to imagine hooking prepared statement creation
to a backend starting up (feature: run X,Y,Z SQL before running user
queries).

Sorry, I do not completely understand your idea.
Yes, it is somehow possible to simulate session semantic by prepending all
session specific commands (mostly setting GUCs) to each SQL statements.
But it doesn't work for prepared statements: the idea of prepared statements
is that compilation of statement should be done only once.

The idea is that you have arbitrary SQL that runs when after the
backend (postgres binary) is forked from postmaster.  This would be an
ideal place to introduce prepared statements in a way that is pooling
compatible; you still couldn't PREPARE from the application but you'd
be free to call already prepared statements (via SQL level EXECUTE or
libpq PQexecPrepared()).  Of course, if somebody throws a DEALLOCATE
or DISCARD ALL, or issues a problematic DROP x CASCADE, you'd be in
trouble but that'a not a big deal IMO because you can control for
those things in the application.
As far as I know in this way prepared statements can be now handled by 
pgbounce in transaction/statement pooling mode.
But from my point of view, in most cases this approach is practically 
unusable.
It is very hard to predict from the very beginning all statements 
applications will want to execute and prepare then at backend start.



Database performance is mostly limited by disk, so optimal number of
backends may be different from number of cores.
But certainly possibility to launch "optimal" number of backends is one of
the advantages of builtin session pooling.

Sure, but some workloads are cpu limited (all- or mostly- read with
data < memory, or very complex queries on smaller datasets).   So we
would measure configure based one expectations exactly as is done
today with pgbouncer.   This is a major feature of pgbouncer: being
able to _reduce_ the number of session states relative to the number
of connections is an important feature; it isolates your database from
various unpleasant failure modes such as runaway memory consumption.

Anyways, I'm looking at your patch.  I see you've separated the client
connection count ('sessions') from the server backend instances
('backends') in the GUC.  Questions:
*) Should non pooled connections be supported simultaneously with
pooled connections?
*) Should there be multiple pools with independent configurations (yes, please)?



Right now my prototype supports two modes:
1. All connections are polled.
2. There are several session pools, each bounded to its own port. 
Connections to the main Postgres port are normal (dedicated).
Connections to one of session pools port's are redirected to one of the 
workers of this page pool.


Please notice, that the last version of connection pooler is in 
https://github.com/postgrespro/postgresql.builtin_pool.git repository.

*) How are you pinning client connections to an application managed
transaction? (IMNSHO, this feature is useless without being able to do
that)

Sorry, I do not completely understand the question.
Rescheduling is now done at transaction level - it means that backand 
can not be switched to other session until completing current transaction.
The main argument  for transaction level pooling is that it allows not 
worry about heavy weight locks, which are associated with procarray entries.




FYI, it's pretty clear you've got a long road building consensus and
hammering out a reasonable patch through the community here.  Don't
get discouraged -- there is value here, but it's going to take some
work.

Thank you.
I am absolutely sure that a lot of additional work has to be done before 
this prototype may become usable.



merlin


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-27 Thread Robert Haas
On Wed, Apr 25, 2018 at 10:09 PM, Michael Paquier  wrote:
> On Wed, Apr 25, 2018 at 03:42:31PM -0400, Robert Haas wrote:
>> The difficulty of finding them all is really the problem.  If we had a
>> reliable way to list everything that needs to be moved into session
>> state, then we could try to come up with a design to do that.
>> Otherwise, we're just swatting issues one by one and I bet we're
>> missing quite a few.
>
> Hm?  We already know about the reset value of a parameter in
> pg_settings, which points out to the value which would be used if reset
> in a session, even after ebeing reloaded.  If you compare it with the
> actual setting value, wouldn't that be enough to know which parameters
> have been changed at session-level by an application once connecting?
> So you can pull out a list using such comparisons.  The context a
> parameter is associated to can also help.

Uh, there's a lot of session backend state other than GUCs.  If the
only thing that we needed to worry about were GUCs, this problem would
have been solved years ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure
On Thu, Apr 26, 2018 at 6:04 AM, Konstantin Knizhnik
 wrote:
> On 25.04.2018 20:02, Merlin Moncure wrote:
>> Yep.  The main workaround today is to disable them.  Having said that,
>> it's not that difficult to imagine hooking prepared statement creation
>> to a backend starting up (feature: run X,Y,Z SQL before running user
>> queries).
>
> Sorry, I do not completely understand your idea.
> Yes, it is somehow possible to simulate session semantic by prepending all
> session specific commands (mostly setting GUCs) to each SQL statements.
> But it doesn't work for prepared statements: the idea of prepared statements
> is that compilation of statement should be done only once.

The idea is that you have arbitrary SQL that runs when after the
backend (postgres binary) is forked from postmaster.  This would be an
ideal place to introduce prepared statements in a way that is pooling
compatible; you still couldn't PREPARE from the application but you'd
be free to call already prepared statements (via SQL level EXECUTE or
libpq PQexecPrepared()).  Of course, if somebody throws a DEALLOCATE
or DISCARD ALL, or issues a problematic DROP x CASCADE, you'd be in
trouble but that'a not a big deal IMO because you can control for
those things in the application.

> Database performance is mostly limited by disk, so optimal number of
> backends may be different from number of cores.
> But certainly possibility to launch "optimal" number of backends is one of
> the advantages of builtin session pooling.

Sure, but some workloads are cpu limited (all- or mostly- read with
data < memory, or very complex queries on smaller datasets).   So we
would measure configure based one expectations exactly as is done
today with pgbouncer.   This is a major feature of pgbouncer: being
able to _reduce_ the number of session states relative to the number
of connections is an important feature; it isolates your database from
various unpleasant failure modes such as runaway memory consumption.

Anyways, I'm looking at your patch.  I see you've separated the client
connection count ('sessions') from the server backend instances
('backends') in the GUC.  Questions:
*) Should non pooled connections be supported simultaneously with
pooled connections?
*) Should there be multiple pools with independent configurations (yes, please)?
*) How are you pinning client connections to an application managed
transaction? (IMNSHO, this feature is useless without being able to do
that)

FYI, it's pretty clear you've got a long road building consensus and
hammering out a reasonable patch through the community here.  Don't
get discouraged -- there is value here, but it's going to take some
work.

merlin



Re: Built-in connection pooling

2018-04-26 Thread Konstantin Knizhnik



On 26.04.2018 05:09, Michael Paquier wrote:

On Wed, Apr 25, 2018 at 03:42:31PM -0400, Robert Haas wrote:

The difficulty of finding them all is really the problem.  If we had a
reliable way to list everything that needs to be moved into session
state, then we could try to come up with a design to do that.
Otherwise, we're just swatting issues one by one and I bet we're
missing quite a few.

Hm?  We already know about the reset value of a parameter in
pg_settings, which points out to the value which would be used if reset
in a session, even after ebeing reloaded.  If you compare it with the
actual setting value, wouldn't that be enough to know which parameters
have been changed at session-level by an application once connecting?
So you can pull out a list using such comparisons.  The context a
parameter is associated to can also help.
--
Michael
Sorry, may be I do not understand you correctly. But GUCs are already 
handled by builtin connection pooler.
It is done at guc.c level, so doesn't matter how GUC variable is 
changed. All modified GUCs are saved into the session context and 
restored on reschedule.


But there are some other static variables which are not related with 
GUCs. Most of them are really associated with backend, not with session. 
So them should not be handled by reschedule.
But there may be some variables which are intended to be session 
specific. And locating this variables is really non trivial task.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-26 Thread Konstantin Knizhnik



On 25.04.2018 20:02, Merlin Moncure wrote:


Would integrated pooling help the sharding case (genuinely curious)?
I don't quite have my head around the issue.  I've always wanted
pgbouncer to be able to do things like round robin queries to
non-sharded replica for simple load balancing but it doesn't (yet)
have that capability.  That type of functionality would not fit into
in in-core pooler AIUI.  Totally agree that the administrative
benefits (user/role/.conf/etc/etc) is a huge win.


Yes, pgbpouncer is not intended to balance workload.
You should use ha-proxy or pg-pool. libpq now allow tp specify multiple 
URLs, but unfortunately right now libpq is not able to perform load 
balancing.

I  do not understand how it is related with integrating connection pooling.
Such pooler definitely shound be external if you want to scatter queries 
between different nodes.



The next most common problem are prepared statements breaking, which certainly 
qualifies as a session-level feature.

Yep.  The main workaround today is to disable them.  Having said that,
it's not that difficult to imagine hooking prepared statement creation
to a backend starting up (feature: run X,Y,Z SQL before running user
queries).


Sorry, I do not completely understand your idea.
Yes, it is somehow possible to simulate session semantic by prepending 
all session specific commands (mostly setting GUCs) to each SQL statements.
But it doesn't work for prepared statements: the idea of prepared 
statements is that compilation of statement should be done only once.



  This might be be less effort than, uh, moving backend
session state to a shareable object.  I'll go further; managing cache
memory consumption (say for pl/pgsql cached plans) is a big deal for
certain workloads.   The only really effective way to deal with that
is to manage the server connection count and/or recycle server
connections on intervals.  Using pgbouncer to control backend count is
a very effective way to deal with this problem and allowing
virtualized connections to each mange there independent cache would be
a step in the opposite direction. I very much like having control so
that I have exactly 8 backends for my 8 core server with 8 copies of
cache.


Database performance is mostly limited by disk, so optimal number of 
backends may be different from number of cores.
But certainly possibility to launch "optimal" number of backends is one 
of the advantages of builtin session pooling.




Advisory locks are a completely separate problem.  I suspect they
might be used more than you realize, and they operate against a very
fundamental subsystem of the database: the locking engine.  I'm
struggling as to why we would take another approach than 'don't use
the non-xact variants of them in a pooling environment'.

merlin


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-25 Thread Michael Paquier
On Wed, Apr 25, 2018 at 03:42:31PM -0400, Robert Haas wrote:
> The difficulty of finding them all is really the problem.  If we had a
> reliable way to list everything that needs to be moved into session
> state, then we could try to come up with a design to do that.
> Otherwise, we're just swatting issues one by one and I bet we're
> missing quite a few.

Hm?  We already know about the reset value of a parameter in
pg_settings, which points out to the value which would be used if reset
in a session, even after ebeing reloaded.  If you compare it with the
actual setting value, wouldn't that be enough to know which parameters
have been changed at session-level by an application once connecting?
So you can pull out a list using such comparisons.  The context a
parameter is associated to can also help.
--
Michael


signature.asc
Description: PGP signature


Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure
On Wed, Apr 25, 2018 at 2:58 PM, Robert Haas  wrote:
> On Wed, Apr 25, 2018 at 10:00 AM, Merlin Moncure  wrote:
>> systems.  If we get that tor free I'd be all for it but reading
>> Robert's email I'm skeptical there are easy wins here.  So +1 for
>> further R and -1 for holding things up based on full
>> transparency...no harm in shooting for that, but let's look at things
>> from a cost/benefit perspective (IMO).
>
> Also, I think it's worth considering that the kinds of failures users
> will get out of anything that's not handled are really the worst kind.
> If you have an application that relies on session state other than
> what his patch knows how to preserve, your application will appear to
> work in light testing because your connection won't actually be
> swapped out underneath you -- and then fail unpredictably in
> production when such swapping occurs.  There will be no clear way to
> tell which error messages or behavior differences are due to
> limitations of the proposed feature, which ones are due to defects in
> the application, and which ones might be due to PostgreSQL bugs.
> They'll all look the same, and even experienced PG hackers won't

Connection pooling is not a new phenomenon, and many stacks (in
particular java) tend to pool connection by default.  All of the
problems we discuss here for the most part affect competitive
solutions and I humbly submit the tradeoffs are _very_ widely
understood.  FWICT we get occasional reports that are simply and
clearly answered.  I guess there are some people dumb enough to flip
GUC settings involving seemingly important things in production
without testing or reading any documentation or the innumerable
articles and blogs that will pop up...hopefully they are self
selecting out of the industry :-).

Looking at pgbouncer, they produce a chart that says, 'these features
don't work, and please consider that before activating this feature'
(https://wiki.postgresql.org/wiki/PgBouncer#Feature_matrix_for_pooling_modes)
and that ought to be entirely sufficient to avoid that class of
problems.   This is very clear and simple.  The main gripes with
pgbouncer FWICT were relating to the postgres JDBC driver's
unavoidable tendency (later fixed) to prepare 'BEGIN' causing various
problems, which was a bug really (in the JDBC driver) which did in
fact spill into this list.

For this feature to be really attractive we'd want to simultaneously
allow pooled and non-pooled connections on different ports, or even
multiple pools (say, for different applications).  Looking at things
from your perspective, we might want to consider blocking (with error)
features that are not 'pooling compatible' if they arrive through a
pooled connection.

merlin



Re: Built-in connection pooling

2018-04-25 Thread Robert Haas
On Wed, Apr 25, 2018 at 10:00 AM, Merlin Moncure  wrote:
> systems.  If we get that tor free I'd be all for it but reading
> Robert's email I'm skeptical there are easy wins here.  So +1 for
> further R and -1 for holding things up based on full
> transparency...no harm in shooting for that, but let's look at things
> from a cost/benefit perspective (IMO).

If we could look at a patch and say "here are the cases that this
patch doesn't handle", then we could perhaps decide "we're OK with
that, let's ship the feature and document the limitations".  But right
now it seems to me that we're looking at a feature where no really
systematic effort has been made to list all of the potential failure
modes, and I'm definitely not on board with the idea of shipping
something with a list of cases that are known to work and an unknown
list of failure modes.   Konstantin has fixed things here and there,
but we don't know how much more there is and don't have a
well-designed plan to find all such things.

Also, I think it's worth considering that the kinds of failures users
will get out of anything that's not handled are really the worst kind.
If you have an application that relies on session state other than
what his patch knows how to preserve, your application will appear to
work in light testing because your connection won't actually be
swapped out underneath you -- and then fail unpredictably in
production when such swapping occurs.  There will be no clear way to
tell which error messages or behavior differences are due to
limitations of the proposed feature, which ones are due to defects in
the application, and which ones might be due to PostgreSQL bugs.
They'll all look the same, and even experienced PG hackers won't
easily be able to tell whether a message saying "cursor XYZ doesn't
exist" (or whatever the case is specifically) is because the
application didn't create that cursor and nevertheless tried to use
it, or whether it's because the connection pooling facility silently
through it out.  All of that sounds to me like it's well below the
standard I'd expect for a core feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-25 Thread Robert Haas
On Tue, Apr 24, 2018 at 1:00 PM, Konstantin Knizhnik
 wrote:
> My expectation is that there are very few of them which has session-level
> lifetime.
> Unfortunately it is not so easy to locate all such places. Once such
> variables are located, them can be saved in session context and restored on
> reschedule.

The difficulty of finding them all is really the problem.  If we had a
reliable way to list everything that needs to be moved into session
state, then we could try to come up with a design to do that.
Otherwise, we're just swatting issues one by one and I bet we're
missing quite a few.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure
On Wed, Apr 25, 2018 at 9:43 AM, Christophe Pettus  wrote:
>
>> On Apr 25, 2018, at 07:00, Merlin Moncure  wrote:
>> The limitations headaches that I suffer with pgbouncer project (which
>> I love and use often) are mainly administrative and performance
>> related, not lack of session based server features.
>
> For me, the most common issue I run into with pgbouncer (after general 
> administrative overhead of having another moving part) is that it works at 
> cross purposes with database-based sharding, as well as useful role and 
> permissions scheme.  Since each server connection is specific to a 
> database/role pair, you are left with some unappealing options to handle that 
> in a pooling environment.

Would integrated pooling help the sharding case (genuinely curious)?
I don't quite have my head around the issue.  I've always wanted
pgbouncer to be able to do things like round robin queries to
non-sharded replica for simple load balancing but it doesn't (yet)
have that capability.  That type of functionality would not fit into
in in-core pooler AIUI.  Totally agree that the administrative
benefits (user/role/.conf/etc/etc) is a huge win.

> The next most common problem are prepared statements breaking, which 
> certainly qualifies as a session-level feature.

Yep.  The main workaround today is to disable them.  Having said that,
it's not that difficult to imagine hooking prepared statement creation
to a backend starting up (feature: run X,Y,Z SQL before running user
queries).  This might be be less effort than, uh, moving backend
session state to a shareable object.  I'll go further; managing cache
memory consumption (say for pl/pgsql cached plans) is a big deal for
certain workloads.   The only really effective way to deal with that
is to manage the server connection count and/or recycle server
connections on intervals.  Using pgbouncer to control backend count is
a very effective way to deal with this problem and allowing
virtualized connections to each mange there independent cache would be
a step in the opposite direction. I very much like having control so
that I have exactly 8 backends for my 8 core server with 8 copies of
cache.

Advisory locks are a completely separate problem.  I suspect they
might be used more than you realize, and they operate against a very
fundamental subsystem of the database: the locking engine.  I'm
struggling as to why we would take another approach than 'don't use
the non-xact variants of them in a pooling environment'.

merlin



Re: Built-in connection pooling

2018-04-25 Thread Christophe Pettus

> On Apr 25, 2018, at 07:00, Merlin Moncure  wrote:
> The limitations headaches that I suffer with pgbouncer project (which
> I love and use often) are mainly administrative and performance
> related, not lack of session based server features. 

For me, the most common issue I run into with pgbouncer (after general 
administrative overhead of having another moving part) is that it works at 
cross purposes with database-based sharding, as well as useful role and 
permissions scheme.  Since each server connection is specific to a 
database/role pair, you are left with some unappealing options to handle that 
in a pooling environment.

The next most common problem are prepared statements breaking, which certainly 
qualifies as a session-level feature.
--
-- Christophe Pettus
   x...@thebuild.com




Re: Built-in connection pooling

2018-04-25 Thread Konstantin Knizhnik



On 25.04.2018 17:00, Merlin Moncure wrote:

On Wed, Apr 25, 2018 at 12:34 AM, Christophe Pettus  wrote:

On Apr 24, 2018, at 06:52, Merlin Moncure  wrote:
Why does it have to be completely transparent?

The main reason to move it into core is to avoid the limitations that a 
non-core pooler has.

The limitations headaches that I suffer with pgbouncer project (which
I love and use often) are mainly administrative and performance
related, not lack of session based server features.  Applications that
operate over a very large amount of virtual connections or engage a
very high level of small transaction traffic are going to avoid
session based features for a lot of other reasons anyways, at least in
my experience.  Probably the most useful feature I miss is async
notifications, so much so that at one point we hacked pgbouncer to
support them.  Point being, full transparency is nice, but there are
workarounds for most of the major issues and there are a lot of side
channel benefits to making your applications 'stateless' (defined as
state in application or database but not in between).

Absent any other consideration, OP has proven to me that there is
massive potential performance gains possible from moving the pooling
mechanism into the database core process, and I'm already very excited
about not having an extra server process to monitor and worry about.
Tracking session state out of process seems pretty complicated and
would probably add add complexity or overhead to multiple internal
systems.  If we get that tor free I'd be all for it but reading
Robert's email I'm skeptical there are easy wins here.  So +1 for
further R and -1 for holding things up based on full
transparency...no harm in shooting for that, but let's look at things
from a cost/benefit perspective (IMO).

merlin
I did more research and find several other think which will not work 
with current built-in connection pooling implementation.
One you have mentioned: notification mechanism. Another one is advisory 
locks. Right now I have now idea how to support them for pooled sessions.
But I will think about it. But IMHO neither notifications, neither 
advisory locks are so widely used, comparing with temporary tables and 
prepared statements...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure
On Wed, Apr 25, 2018 at 12:34 AM, Christophe Pettus  wrote:
>
>> On Apr 24, 2018, at 06:52, Merlin Moncure  wrote:
>> Why does it have to be completely transparent?
>
> The main reason to move it into core is to avoid the limitations that a 
> non-core pooler has.

The limitations headaches that I suffer with pgbouncer project (which
I love and use often) are mainly administrative and performance
related, not lack of session based server features.  Applications that
operate over a very large amount of virtual connections or engage a
very high level of small transaction traffic are going to avoid
session based features for a lot of other reasons anyways, at least in
my experience.  Probably the most useful feature I miss is async
notifications, so much so that at one point we hacked pgbouncer to
support them.  Point being, full transparency is nice, but there are
workarounds for most of the major issues and there are a lot of side
channel benefits to making your applications 'stateless' (defined as
state in application or database but not in between).

Absent any other consideration, OP has proven to me that there is
massive potential performance gains possible from moving the pooling
mechanism into the database core process, and I'm already very excited
about not having an extra server process to monitor and worry about.
Tracking session state out of process seems pretty complicated and
would probably add add complexity or overhead to multiple internal
systems.  If we get that tor free I'd be all for it but reading
Robert's email I'm skeptical there are easy wins here.  So +1 for
further R and -1 for holding things up based on full
transparency...no harm in shooting for that, but let's look at things
from a cost/benefit perspective (IMO).

merlin



Re: Built-in connection pooling

2018-04-24 Thread Christophe Pettus

> On Apr 24, 2018, at 06:52, Merlin Moncure  wrote:
> Why does it have to be completely transparent? 

Well, we have non-transparent connection pooling now, in the form of pgbouncer, 
and the huge fleet of existing application-stack poolers.  The main reason to 
move it into core is to avoid the limitations that a non-core pooler has.

--
-- Christophe Pettus
   x...@thebuild.com




Re: Built-in connection pooling

2018-04-24 Thread Konstantin Knizhnik



On 23.04.2018 23:14, Robert Haas wrote:

On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas  wrote:

Well, may be I missed something, but i do not know how to efficiently
support
1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than
session).

Me neither. What makes it easier to do these things in an internal
connection pooler? What could the backend do differently, to make these
easier to implement in an external pooler?

I think you are Konstantin are possibly failing to see the big picture
here.  Temporary tables, prepared statements, and GUC settings are
examples of session state that users expect will be preserved for the
lifetime of a connection and not beyond; all session state, of
whatever kind, has the same set of problems.  A transparent connection
pooling experience means guaranteeing that no such state vanishes
before the user ends the current session, and also that no such state
established by some other session becomes visible in the current
session.  And we really need to account for *all* such state, not just
really big things like temporary tables and prepared statements and
GUCs but also much subtler things such as the state of the PRNG
established by srandom().


It is not quit true thst I have not realized this issues.
In addition to connection pooling, I have also implemented pthread 
version of Postgres and their static variables are replaced with 
thread-local variables which let each thread use its own set of variables.


Unfortunately in connection pooling this approach can not be used.
But I think that performing scheduling at transaction level will 
eliminate the problem with static variables in most cases.
My expectation is that there are very few of them which has 
session-level lifetime.
Unfortunately it is not so easy to locate all such places. Once such 
variables are located, them can be saved in session context and restored 
on reschedule.


More challenging thing is to handle system static variables which which 
can not be easily saved/restored. You example with srandom is exactly 
such case.
Right now I do not know any efficient way to suspend/resume 
pseudo-random sequence.
But frankly speaking, that such behaviour of random is completely not 
acceptable and built-in session pool unusable.





This is really very similar to the problem that parallel query has
when spinning up new worker backends.  As far as possible, we want the
worker backends to have the same state as the original backend.
However, there's no systematic way of being sure that every relevant
backend-private global, including perhaps globals added by loadable
modules, is in exactly the same state.  For parallel query, we solved
that problem by copying a bunch of things that we knew were
commonly-used (cf. parallel.c) and by requiring functions to be
labeled as parallel-restricted if they rely on anything other state.
The problem for connection pooling is much harder.  If you only ever
ran parallel-safe functions throughout the lifetime of a session, then
you would know that the session has no "hidden state" other than what
parallel.c already knows about (except for any functions that are
mislabeled, but we can say that's the user's fault for mislabeling
them).  But as soon as you run even one parallel-restricted or
parallel-unsafe function, there might be a global variable someplace
that holds arbitrary state which the core system won't know anything
about.  If you want to have some other process take over that session,
you need to copy that state to the new process; if you want to reuse
the current process for a new session, you need to clear that state.
Since you don't know it exists or where to find it, and since the code
to copy and/or clear it might not even exist, you can't.

In other words, transparent connection pooling is going to require
some new mechanism, which third-party code will have to know about,
for tracking every last bit of session state that might need to be
preserved or cleared.  That's going to be a big project.  Maybe some
of that can piggyback on existing infrastructure like
InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
state to deal with.  And no out-of-core pooler has a chance of
handling all that stuff correctly; an in-core pooler will be able to
do so only with a lot of work.


I think that situation with parallel executors are slightly different: 
in this case several backends perform execution of the same query.

So them really need to somehow share/synchronize state of static variables.
But in case of connection pooling only one transaction is executed by 
backend at each moment of time. And there should be no problems with 
static variables unless them cross transaction boundary. But I do not 
think that there are many such variables.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-24 Thread Adam Brusselback
On Tue, Apr 24, 2018 at 9:52 AM, Merlin Moncure  wrote:
>
> Why does it have to be completely transparent?  As long as the feature
> is optional (say, a .conf setting) the tradeoffs can be managed.  It's
> a reasonable to expect to exchange some functionality for pooling;
> pgbouncer provides a 'release' query (say, DISCARD ALL)  to be called
> upon release back to the pool.  Having session state objects (not all
> of which we are talking about; advisory locks and notifications
> deserve consideration) 'just work' would be wonderful but ought not to
> hold up other usages of the feature.
>
> merlin

Just my $0.02, I wouldn't take advantage of this feature as a user
without it being transparent.
I use too many of the features which would be affected by not
maintaining the state.  That's one of the reasons I only use an
external JDBC pooler for my primary application, and plain ole
connections for all of my secondary services which need to just work
with temp tables, session variables, etc.  I'd love it if I could use
one of those poolers (or a built in one) which just magically
increased performance for starting up connections, lowered the
overhead of idle sessions, and didn't mess with session state.

Short of that, i'll take the hit in performance and using more memory
than I should with direct connections for now.

Not sure how other users feel, but that's where  i'm sitting for my use case.



Re: Built-in connection pooling

2018-04-24 Thread Merlin Moncure
On Mon, Apr 23, 2018 at 3:14 PM, Robert Haas  wrote:
> In other words, transparent connection pooling is going to require
> some new mechanism, which third-party code will have to know about,
> for tracking every last bit of session state that might need to be
> preserved or cleared.  That's going to be a big project.  Maybe some
> of that can piggyback on existing infrastructure like
> InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
> state to deal with.  And no out-of-core pooler has a chance of
> handling all that stuff correctly; an in-core pooler will be able to
> do so only with a lot of work.

Why does it have to be completely transparent?  As long as the feature
is optional (say, a .conf setting) the tradeoffs can be managed.  It's
a reasonable to expect to exchange some functionality for pooling;
pgbouncer provides a 'release' query (say, DISCARD ALL)  to be called
upon release back to the pool.  Having session state objects (not all
of which we are talking about; advisory locks and notifications
deserve consideration) 'just work' would be wonderful but ought not to
hold up other usages of the feature.

merlin



Re: Built-in connection pooling

2018-04-24 Thread Konstantin Knizhnik



On 23.04.2018 21:56, Robert Haas wrote:

On Fri, Jan 19, 2018 at 11:59 AM, Tomas Vondra
 wrote:

Hmmm, that's unfortunate. I guess you'll have process the startup packet
in the main process, before it gets forked. At least partially.

I'm not keen on a design that would involve doing more stuff in the
postmaster, because that would increase the chances of the postmaster
accidentally dying, which is really bad.  I've been thinking about the
idea of having a separate "listener" process that receives
connections, and that the postmaster can restart if it fails.  Or
there could even be multiple listeners if needed.  When the listener
gets a connection, it hands it off to another process that then "owns"
that connection.

One problem with this is that the process that's going to take over
the connection needs to get started by the postmaster, not the
listener.  The listener could signal the postmaster to start it, just
like we do for background workers, but that might add a bit of
latency.   So what I'm thinking is that the postmaster could maintain
a small (and configurably-sized) pool of preforked workers.  That
might be worth doing independently, as a way to reduce connection
startup latency, although somebody would have to test it to see
whether it really works... a lot of the startup work can't be done
until we know which database the user wants.



I agree that starting separate "listener" process(es) is the most 
flexible and scalable solution.
I have not implemented this apporach due to the problems with forking 
new backend you have mentioned.

But certainly it can be addressed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian
On Mon, Apr 23, 2018 at 09:53:37PM -0400, Bruce Momjian wrote:
> On Mon, Apr 23, 2018 at 09:47:07PM -0400, Robert Haas wrote:
> > On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> > > So, instead of trying to multiplex multiple sessions in a single
> > > operating system process, why don't we try to reduce the overhead of
> > > idle sessions that each have an operating system process?  We already
> > > use procArray to reduce the number of _assigned_ PGPROC entries we have
> > > to scan.  Why can't we create another array that only contains _active_
> > > sessions, i.e. those not in a transaction.  In what places can procArray
> > > scans be changed to use this new array?
> > 
> > There are lots of places where scans would benefit, but the cost of
> > maintaining the new array would be very high in some workloads, so I
> > don't think you'd come out ahead overall.  Feel free to code it up and
> > test it, though.
> 
> Well, it would be nice if we new exactly which scans are slow for a
> large number of idle sessions, and then we could determine what criteria
> for that array would be beneficial --- that seems like the easiest place
> to start.

I guess my point is if we are looking at trying to store all the session
state in shared memory, so any process can resume it, we might as well
see if we can find a way to more cheaply store the state in an idle
process.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian
On Mon, Apr 23, 2018 at 09:47:07PM -0400, Robert Haas wrote:
> On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> > So, instead of trying to multiplex multiple sessions in a single
> > operating system process, why don't we try to reduce the overhead of
> > idle sessions that each have an operating system process?  We already
> > use procArray to reduce the number of _assigned_ PGPROC entries we have
> > to scan.  Why can't we create another array that only contains _active_
> > sessions, i.e. those not in a transaction.  In what places can procArray
> > scans be changed to use this new array?
> 
> There are lots of places where scans would benefit, but the cost of
> maintaining the new array would be very high in some workloads, so I
> don't think you'd come out ahead overall.  Feel free to code it up and
> test it, though.

Well, it would be nice if we new exactly which scans are slow for a
large number of idle sessions, and then we could determine what criteria
for that array would be beneficial --- that seems like the easiest place
to start.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-04-23 Thread Robert Haas
On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> So, instead of trying to multiplex multiple sessions in a single
> operating system process, why don't we try to reduce the overhead of
> idle sessions that each have an operating system process?  We already
> use procArray to reduce the number of _assigned_ PGPROC entries we have
> to scan.  Why can't we create another array that only contains _active_
> sessions, i.e. those not in a transaction.  In what places can procArray
> scans be changed to use this new array?

There are lots of places where scans would benefit, but the cost of
maintaining the new array would be very high in some workloads, so I
don't think you'd come out ahead overall.  Feel free to code it up and
test it, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian
On Fri, Apr 20, 2018 at 11:40:59AM +0300, Konstantin Knizhnik wrote:
> 
> Sorry, may we do not understand each other.
> There are the following facts:
> 1. There are some entities in Postgres which are local to a backend:
> temporary tables, GUCs, prepared statement, relation and catalog caches,...
> 2. Postgres doesn't "like"  larger number of backends. Even only few of them
> are actually active. Large number of backends means large procarray, large
> snapshots,...
> Please refere to my measurement at the beginning of this thread which
> illustrate how performance of Pastgres degrades with increasing number of
> backends.

So, instead of trying to multiplex multiple sessions in a single
operating system process, why don't we try to reduce the overhead of
idle sessions that each have an operating system process?  We already
use procArray to reduce the number of _assigned_ PGPROC entries we have
to scan.  Why can't we create another array that only contains _active_
sessions, i.e. those not in a transaction.  In what places can procArray
scans be changed to use this new array?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-04-23 Thread Robert Haas
On Fri, Jan 19, 2018 at 11:59 AM, Tomas Vondra
 wrote:
> Hmmm, that's unfortunate. I guess you'll have process the startup packet
> in the main process, before it gets forked. At least partially.

I'm not keen on a design that would involve doing more stuff in the
postmaster, because that would increase the chances of the postmaster
accidentally dying, which is really bad.  I've been thinking about the
idea of having a separate "listener" process that receives
connections, and that the postmaster can restart if it fails.  Or
there could even be multiple listeners if needed.  When the listener
gets a connection, it hands it off to another process that then "owns"
that connection.

One problem with this is that the process that's going to take over
the connection needs to get started by the postmaster, not the
listener.  The listener could signal the postmaster to start it, just
like we do for background workers, but that might add a bit of
latency.   So what I'm thinking is that the postmaster could maintain
a small (and configurably-sized) pool of preforked workers.  That
might be worth doing independently, as a way to reduce connection
startup latency, although somebody would have to test it to see
whether it really works... a lot of the startup work can't be done
until we know which database the user wants.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Built-in connection pooling

2018-04-23 Thread Vladimir Borodin


> 19 апр. 2018 г., в 23:59, Andres Freund  написал(а):
> 
> I think there's plenty things that don't really make sense solving
> outside of postgres:
> - additional added hop / context switches due to external pooler
> - temporary tables
> - prepared statements
> - GUCs and other session state

+1

> 
> I think there's at least one thing that we should attempt to make
> easier for external pooler:
> - proxy authorization

I suggested it here [1] but fair amount of people argued against it in that 
thread.

[1] 
https://www.postgresql.org/message-id/98C8F3EF-52F0-4AF9-BE81-405C15D77DEA%40simply.name

--
May the force be with you…
https://simply.name



Re: Built-in connection pooling

2018-04-20 Thread Tatsuo Ishii
>> I understand your customers like to have unlimited number of
>> connections.  But my customers do not. (btw, even with normal
>> PostgreSQL, some of my customers are happily using over 1k, even 5k
>> max_connections).
> 
> If you have limited number of client, then you do not need pooling at
> all.

Still pooler is needed even if the number of connections is low
because connecting to PostgreSQL is very expensive operation as
everybody knows.

BTW, the main reason why Pgpool-II is used is, because it is a pooler,
but query routing: write queies to primary server and read queries to
standbys. This is not possible in bulit-in pooler.

>> I am confused.  If so, why do you want to push statement based or
>> transaction based built-in connection pooler?
> 
> I want to provide session semantic but do not start dedicated backend
> for each session.
> Transaction level rescheduling (rather than statement level
> resheduling) is used to avoid complexity with storing/restoring
> transaction context and maintaining locks.

Not sure if it's acceptable for community. Probably many developers
want built-in pooler keeps exactly the same semantics as normal
connections.

Tome Lane wrote:
> FWIW, I concur with Heikki's position that we're going to have very high
> standards for the transparency of any in-core pooler.  Before trying to
> propose a patch, it'd be a good idea to try to fix the perceived
> shortcomings of some existing external pooler.  Only after you can say
> "there's nothing wrong with this that isn't directly connected to its
> not being in-core" does it make sense to try to push the logic into core.

So I would suggest you to start with session level in-core pooler,
which would be much easier than transaction level pooler to make it
transparent.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Built-in connection pooling

2018-04-20 Thread Craig Ringer
On Fri., 20 Apr. 2018, 06:59 Andres Freund,  wrote:

> On 2018-04-19 15:01:24 -0400, Tom Lane wrote:
> > Only after you can say "there's nothing wrong with this that isn't
> > directly connected to its not being in-core" does it make sense to try
> > to push the logic into core.
>
> I think there's plenty things that don't really make sense solving
> outside of postgres:
> - additional added hop / context switches due to external pooler
> - temporary tables
> - prepared statements
> - GUCs and other session state
>

Totally agreed. Poolers can make some limited efforts there, but that's all.

Poolers also have a hard time determining if a query is read-only or
read/write. Wheas Pg its self has a better chance, and we could help it
along with function READONLY attributes if we wanted. This matters
master/standby query routing. Standbys being able to proxy for the master
would be fantastic but isn't practical without some kind of pooler.


> I think there's at least one thing that we should attempt to make
> easier for external pooler:
> - proxy authorization
>

Yes, very yes. I've raised this before in a limited form - SET SESSION
AURHORIZATION that cannot be reset without a cookie value. But true proxy
auth would be better.


Re: Built-in connection pooling

2018-04-20 Thread Konstantin Knizhnik



On 20.04.2018 12:02, Tatsuo Ishii wrote:


I understand your customers like to have unlimited number of
connections.  But my customers do not. (btw, even with normal
PostgreSQL, some of my customers are happily using over 1k, even 5k
max_connections).


If you have limited number of client, then you do not need pooling at all.
With the only one exception if clients for some reasons do not want to 
keep connections to database server and

prefer to establish connection on demand and disconnect as soon as possible.
But IMHO in most cases it meas bad design of client application, because 
establishing connection (even with connection pooler) is quite expensive 
operation.

The primary idea and main benefit of built-in connection pooler is to

support session semantic with limited number of backends.

I am confused.  If so, why do you want to push statement based or
transaction based built-in connection pooler?


I want to provide session semantic but do not start dedicated backend 
for each session.
Transaction level rescheduling (rather than statement level resheduling) 
is used to avoid complexity with storing/restoring transaction context 
and maintaining locks.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-20 Thread Tatsuo Ishii
 This is only applied to external process type pooler (like Pgpool-II).

> - temporary tables
> - prepared statements
> - GUCs and other session state
 These are only applied to "non session based" pooler; sharing a
 database connection with multiple client connections. "Session based"
 connection pooler like Pgpool-II does not have the shortcomings.
>>> But them are not solving the main problem: restricting number of
>>> launched backends.
>> Pgpool-II already does this. If number of concurrent clients exceeds
>> max_connections, max_connections+1 client have to wait until other
>> client disconnect the session. So "restricting number of launched
>> backends" is an indenpendet function from whether "session based"
>> connection poolers" is used or not.
> Sorry, but delaying new client connection until some other client is
> disconnected is not an acceptable solution in most cases.

I just wanted to pointed out the counter fact against this.

>>> But them are not solving the main problem: restricting number of
>>> launched backends.

> Most of customers want to provide connections to the database server
> for unlimited (or at least > 100) number of clients.
> And this clients used to keep connection alive and do not hangout
> after execution of each statement/transaction.
> In this case approach with session pooling dopesn't work.

I understand your customers like to have unlimited number of
connections.  But my customers do not. (btw, even with normal
PostgreSQL, some of my customers are happily using over 1k, even 5k
max_connections).

>>> Pgbouncer  also can be used in session pooling mode. But  it makes
>>> sense only if there is limited number of clients which permanently
>>> connect/disconnect to the database.
>>> But I do not think that it is so popular use case. Usually there is
>>> very large number of connected clients which rarely drop connection
>>> but only few of them are active at each moment of time.
>> Not neccessarily. i.e. Session based poolers allow to use temporary
>> tables, prepared statements and keep GUCs and other session state,
>> while non session based poolers does not allow to use them.
>>
>> So choosing "session based poolers" or "non session based poolers" is
>> a trade off. i.e. let user choose one of them.
>>
>> If you are willing to merge your connection pooler into core, I would
>> suggest you'd better to implement those pool modes.
> 
> 
> Sorry, may we do not understand each other.
> There are the following facts:
> 1. There are some entities in Postgres which are local to a backend:
> temporary tables, GUCs, prepared statement, relation and catalog
> caches,...
> 2. Postgres doesn't "like"  larger number of backends. Even only few
> of them are actually active. Large number of backends means large
> procarray, large snapshots,...
> Please refere to my measurement at the beginning of this thread which
> illustrate how performance of Pastgres degrades with increasing number
> of backends.
> 3. Session semantic (prepared statements, GUCs, temporary tables) can
> be supported only in session level pooling mode.

I agree with 1 -3.

> 4. This mode is not acceptable in most cases because it is not
> possible to limit number of clients which want to establish connection
> wither database server or keep it small.
> This is why most pgbouncer users are using statement pooling mode.

Not sure about 4. I rarely see such users around me.

> 5. It doesn't matter how you manged to implement pooling outside
> Postgres: if you want to preserve session semantic, then you need to
> spawn as much backends as sessions. And number of clients is limited
> by number of backends/sessions.

Rigt. I am happy with the limitation for now.

> The primary idea and main benefit of built-in connection pooler is to
> support session semantic with limited number of backends.

I am confused.  If so, why do you want to push statement based or
transaction based built-in connection pooler?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Built-in connection pooling

2018-04-20 Thread Konstantin Knizhnik



On 20.04.2018 11:16, Tatsuo Ishii wrote:

On 20.04.2018 01:58, Tatsuo Ishii wrote:

I think there's plenty things that don't really make sense solving
outside of postgres:
- additional added hop / context switches due to external pooler

This is only applied to external process type pooler (like Pgpool-II).


- temporary tables
- prepared statements
- GUCs and other session state

These are only applied to "non session based" pooler; sharing a
database connection with multiple client connections. "Session based"
connection pooler like Pgpool-II does not have the shortcomings.

But them are not solving the main problem: restricting number of
launched backends.

Pgpool-II already does this. If number of concurrent clients exceeds
max_connections, max_connections+1 client have to wait until other
client disconnect the session. So "restricting number of launched
backends" is an indenpendet function from whether "session based"
connection poolers" is used or not.
Sorry, but delaying new client connection until some other client is 
disconnected is not an acceptable solution in most cases.
Most of customers want to provide connections to the database server for 
unlimited (or at least > 100) number of clients.
And this clients used to keep connection alive and do not hangout after 
execution of each statement/transaction.

In this case approach with session pooling dopesn't work.







Pgbouncer  also can be used in session pooling mode. But  it makes
sense only if there is limited number of clients which permanently
connect/disconnect to the database.
But I do not think that it is so popular use case. Usually there is
very large number of connected clients which rarely drop connection
but only few of them are active at each moment of time.

Not neccessarily. i.e. Session based poolers allow to use temporary
tables, prepared statements and keep GUCs and other session state,
while non session based poolers does not allow to use them.

So choosing "session based poolers" or "non session based poolers" is
a trade off. i.e. let user choose one of them.

If you are willing to merge your connection pooler into core, I would
suggest you'd better to implement those pool modes.



Sorry, may we do not understand each other.
There are the following facts:
1. There are some entities in Postgres which are local to a backend: 
temporary tables, GUCs, prepared statement, relation and catalog caches,...
2. Postgres doesn't "like"  larger number of backends. Even only few of 
them are actually active. Large number of backends means large 
procarray, large snapshots,...
Please refere to my measurement at the beginning of this thread which 
illustrate how performance of Pastgres degrades with increasing number 
of backends.
3. Session semantic (prepared statements, GUCs, temporary tables) can be 
supported only in session level pooling mode.
4. This mode is not acceptable in most cases because it is not possible 
to limit number of clients which want to establish connection wither 
database server or keep it small.

This is why most pgbouncer users are using statement pooling mode.
5. It doesn't matter how you manged to implement pooling outside 
Postgres: if you want to preserve session semantic, then you need to 
spawn as much backends as sessions. And number of clients is limited by 
number of backends/sessions.


The primary idea and main benefit of built-in connection pooler is to 
support session semantic with limited number of backends.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-20 Thread Tatsuo Ishii
> On 20.04.2018 01:58, Tatsuo Ishii wrote:
>>> I think there's plenty things that don't really make sense solving
>>> outside of postgres:
>>> - additional added hop / context switches due to external pooler
>> This is only applied to external process type pooler (like Pgpool-II).
>>
>>> - temporary tables
>>> - prepared statements
>>> - GUCs and other session state
>> These are only applied to "non session based" pooler; sharing a
>> database connection with multiple client connections. "Session based"
>> connection pooler like Pgpool-II does not have the shortcomings.
> But them are not solving the main problem: restricting number of
> launched backends.

Pgpool-II already does this. If number of concurrent clients exceeds
max_connections, max_connections+1 client have to wait until other
client disconnect the session. So "restricting number of launched
backends" is an indenpendet function from whether "session based"
connection poolers" is used or not.

> Pgbouncer  also can be used in session pooling mode. But  it makes
> sense only if there is limited number of clients which permanently
> connect/disconnect to the database.
> But I do not think that it is so popular use case. Usually there is
> very large number of connected clients which rarely drop connection
> but only few of them are active at each moment of time.

Not neccessarily. i.e. Session based poolers allow to use temporary
tables, prepared statements and keep GUCs and other session state,
while non session based poolers does not allow to use them.

So choosing "session based poolers" or "non session based poolers" is
a trade off. i.e. let user choose one of them.

If you are willing to merge your connection pooler into core, I would
suggest you'd better to implement those pool modes.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Built-in connection pooling

2018-04-20 Thread Konstantin Knizhnik



On 20.04.2018 03:14, Tatsuo Ishii wrote:

On Fri, Apr 20, 2018 at 07:58:00AM +0900, Tatsuo Ishii wrote:

Yeah. Since SCRAM auth is implemented, some connection poolers
including Pgpool-II are struggling to adopt it.

Er, well.  pgpool is also taking advantage of MD5 weaknesses...  While
SCRAM fixes this class of problems, and channel binding actually makes
this harder for poolers to deal with.

One of Pgpool-II developers Usama are working hard to re-implement
SCRAM auth for upcoming Pgpool-II 4.0: i.e. storing passwords (of
course in some encrypted form) in Pgpool-II.



Just want to notice that authentication is are where I have completely 
no experience.
So any suggestions or help  in developing right authentication mechanism 
for built-in connection pooling is welcome.


Right authentication of pooled session by shared backend is performed in 
the same way as by normal (dedicated) Postgres backend.
Postmaster just transfer accepted socket to one of the workers 
(backends) and it performs authentication in normal way.
It actually means that all sessions scheduled to the same worker should 
access the same database under the same user.
Accepting connections to different databases/users right now is 
supported by making it possible to create several session pools and 
binding each session pool to its own port at which postmaster will 
accept connections to this page pool.


As alternative approach I considered spawning separate "authentication" 
process (or do it in postmaster), which will process startup package and 
only after it schedule session to one of the workers. But such policy is 
much more difficult to implement and it is unclear how to map 
database/user pairs to worker backends.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-20 Thread Konstantin Knizhnik



On 20.04.2018 01:58, Tatsuo Ishii wrote:

I think there's plenty things that don't really make sense solving
outside of postgres:
- additional added hop / context switches due to external pooler

This is only applied to external process type pooler (like Pgpool-II).


- temporary tables
- prepared statements
- GUCs and other session state

These are only applied to "non session based" pooler; sharing a
database connection with multiple client connections. "Session based"
connection pooler like Pgpool-II does not have the shortcomings.
But them are not solving the main problem: restricting number of 
launched backends.
Pgbouncer  also can be used in session pooling mode. But  it makes sense 
only if there is limited number of clients which permanently 
connect/disconnect to the database.
But I do not think that it is so popular use case. Usually there is very 
large number of connected clients which rarely drop connection but only 
few of them are active at each moment of time.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-20 Thread Konstantin Knizhnik



On 19.04.2018 17:27, Dave Cramer wrote:



On Thu, Apr 19, 2018, 9:24 AM Konstantin Knizhnik, 
> wrote:




On 19.04.2018 07:46, Tsunakawa, Takayuki wrote:
> From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru
]
> Oracle, for example, you can create dedicated and non-dedicated
backends.
>> I wonder why we do not want to have something similar in Postgres.
> Yes, I want it, too.  In addition to dedicated and shared server
processes, Oracle provides Database Resident Connection Pooling
(DRCP).  I guessed you were inspired by this.
>
>

https://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc002.htm#ADMIN12348

It seems to be that my connection pooling is more close to DRCP
than to
shared servers.
It is not clear from this article what this 35KB per client
connection
are used for...
It seems to be some thing similar with session context used to
suspend/resume session.
In my prototype I also maintain some per-session context to keep
values
of session specific GUCs, temporary namespace, ...
Definitely pooled session memory footprint depends on size of
catalog,
prepared statements, updated GUCs,... but 10-100kb seems to be a
reasonable estimation.


>
> BTW, you are doing various great work -- autoprepare,
multithreaded Postgres, built-in connection pooling, etc. etc.,
aren't you?  Are you doing all of these alone?
Yes, but there is huge distance from prototype till product-ready
solution. And definitely I need some help here. This is why I have to
suspend future development of multithreaded version of Postgres
(looks
like it is not considered as some realistic project by community).
But with builtin connection pooling situation is better and I am
going
to tests it with some our clients which are interested in this
feature.


Konstantin



It would be useful to test with the JDBC driver

We run into issues with many pool implementations due to our 
opinionated nature


I have tested built-in connection pool with YCSB benchmark which is 
implemented in Java and so works through JDBC driver.

Results were published in the following mail in this thread:
https://www.postgresql.org/message-id/7359-c582-7a08-5772-cb882988c0ae%40postgrespro.ru


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-04-20 Thread Vladimir Sitnikov
>Development in built-in connection pooling will be continued in
https://github.com/postgrespro/postgresql.builtin_pool.git

The branch (as of 0020c44195992c6dce26baec354a5e54ff30b33f) passes pgjdbc
tests: https://travis-ci.org/vlsi/pgjdbc/builds/368997672

Current tests are mostly single-threaded, so the tests are unlikely to
trigger lots of "concurrent connection" uses.
The next step might be to create multiple schemas, and execute multiple
tests in parallel.

Vladimir


Re: Built-in connection pooling

2018-04-20 Thread Vladimir Sitnikov
Christopher>One of the things that they find likable is that by having the
connection
pool live
Christopher>in the framework alongside the application is that this makes
it easy to
attach
Christopher>hooks so that the pool can do intelligent things based on
application-aware
logic.

I'm afraid I do not follow you. Can you please provide an example?

TL;DR:
1) I think in-application pooling would be required for performance reasons
in any case.
2) Out-of-application pooling (in-backend or in-the-middle) is likely
needed as well


JDBC clients use client-side connection pooling for performance reasons:

1) Connection setup does have overhead:
1.1) TCP connection takes time to init/close
1.2) Startup queries involve a couple of roundrips: "startup packet", then
"SET extra_float_digits = 3", then "SET application_name = '...' "
2) Binary formats on the wire are tied to oids. Clients have to cache the
oids somehow, and "cache per connection" is the current approach.
3) Application threads tend to augment "application_name", "search_path",
etc for its own purposes, and it would slow the application down
significantly if JDBC driver reverted application_name/search_path/etc for
each and every "connection borrow".
4) I believe there's non-zero overhead for backend process startup

As Konstantin lists in the initial email, the problem is backend itself
does not scale well with lots of backend processes.
In other words: it is fine if PostgreSQL is accessed by a single Java
application since the number of connections would be reasonable (limited by
the Java connection pool).
That, however, is not the case when the DB is accessed by lots of
applications (==lots of idle connections) and/or in case the application is
using short-lived connections (==in-app pool is missing that forces backend
processes to come and go).

Vladimir


Re: Built-in connection pooling

2018-04-19 Thread Tatsuo Ishii
> On Fri, Apr 20, 2018 at 07:58:00AM +0900, Tatsuo Ishii wrote:
>> Yeah. Since SCRAM auth is implemented, some connection poolers
>> including Pgpool-II are struggling to adopt it.
> 
> Er, well.  pgpool is also taking advantage of MD5 weaknesses...  While
> SCRAM fixes this class of problems, and channel binding actually makes
> this harder for poolers to deal with.

One of Pgpool-II developers Usama are working hard to re-implement
SCRAM auth for upcoming Pgpool-II 4.0: i.e. storing passwords (of
course in some encrypted form) in Pgpool-II.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Built-in connection pooling

2018-04-19 Thread Michael Paquier
On Fri, Apr 20, 2018 at 07:58:00AM +0900, Tatsuo Ishii wrote:
> Yeah. Since SCRAM auth is implemented, some connection poolers
> including Pgpool-II are struggling to adopt it.

Er, well.  pgpool is also taking advantage of MD5 weaknesses...  While
SCRAM fixes this class of problems, and channel binding actually makes
this harder for poolers to deal with.
--
Michael


signature.asc
Description: PGP signature


Re: Built-in connection pooling

2018-04-19 Thread Tatsuo Ishii
> I think there's plenty things that don't really make sense solving
> outside of postgres:
> - additional added hop / context switches due to external pooler

This is only applied to external process type pooler (like Pgpool-II).

> - temporary tables
> - prepared statements
> - GUCs and other session state

These are only applied to "non session based" pooler; sharing a
database connection with multiple client connections. "Session based"
connection pooler like Pgpool-II does not have the shortcomings.

One thing either built-in or application library type pooler (like
JDBC) cannot do is, handle multiple PostgreSQL servers.

> I think there's at least one thing that we should attempt to make
> easier for external pooler:
> - proxy authorization

Yeah. Since SCRAM auth is implemented, some connection poolers
including Pgpool-II are struggling to adopt it.

Another thing PostgreSQL can do to make external pooler's life easier
is, enhancing frontend/backend protocol so that reply messages of
prepare etc. include portal/statement info. But apparently this needs
protocol changes.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Built-in connection pooling

2018-04-19 Thread Andres Freund
On 2018-04-19 15:01:24 -0400, Tom Lane wrote:
> Only after you can say "there's nothing wrong with this that isn't
> directly connected to its not being in-core" does it make sense to try
> to push the logic into core.

I think there's plenty things that don't really make sense solving
outside of postgres:
- additional added hop / context switches due to external pooler
- temporary tables
- prepared statements
- GUCs and other session state

I think there's at least one thing that we should attempt to make
easier for external pooler:
- proxy authorization

I think in an "ideal world" there's two kinds of poolers: Dumb ones
further out from the database (for short lived processes, keeping the
total number of connections sane, etc) and then more intelligent one
closer to the database.

Greetings,

Andres Freund



Re: Built-in connection pooling

2018-04-19 Thread Tom Lane
Stephen Frost  writes:
> Greetings,
> * Andres Freund (and...@anarazel.de) wrote:
>> On 2018-04-18 06:36:38 -0400, Heikki Linnakangas wrote:
>>> However, I suspect that dealing with *all* of the issues is going to be hard
>>> and tedious. And if there are any significant gaps, things that don't work
>>> correctly with the pooler, the patch will almost certainly be rejected.

>> FWIW, I think that's not the right course. We should work towards an
>> in-core pooler. There's very few postgres installations that don't need
>> one, and there's a lot of things that are very hard to do without closer
>> integration.

> I tend to agree with this and things like trying to proxy authentication
> are really not ideal, since it involves necessairly trusting another
> system.

FWIW, I concur with Heikki's position that we're going to have very high
standards for the transparency of any in-core pooler.  Before trying to
propose a patch, it'd be a good idea to try to fix the perceived
shortcomings of some existing external pooler.  Only after you can say
"there's nothing wrong with this that isn't directly connected to its
not being in-core" does it make sense to try to push the logic into core.

regards, tom lane



Re: Built-in connection pooling

2018-04-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-04-18 06:36:38 -0400, Heikki Linnakangas wrote:
> > On 18/04/18 06:10, Konstantin Knizhnik wrote:
> > > But there are still use cases which can not be covered y external
> > > connection pooler.
> > 
> > Can you name some? I understand that the existing external connection
> > poolers all have their limitations. But are there some fundamental issues
> > that can *only* be addressed by a built-in implementation?
> > 
> > For the record, I think an internal connection pool might be a good idea. It
> > would presumably be simpler to set up than an external one, for example. But
> > it depends a lot on the implementation. If we had an internal connection
> > pool, I would expect it to be very transparent to the user, be simple to set
> > up, and not have annoying limitations with prepared statements, temporary
> > tables, etc. that the existing external ones have.
> > 
> > However, I suspect that dealing with *all* of the issues is going to be hard
> > and tedious. And if there are any significant gaps, things that don't work
> > correctly with the pooler, the patch will almost certainly be rejected.
> > 
> > I'd recommend that you put your effort in improving the existing external
> > connection poolers. Which one is closest to suit your needs? What's missing?
> > 
> > There are probably things we could do in the server, to help external
> > connection poolers. For example, some kind of a proxy authentication, where
> > the connection pooler could ask the backend to do authentication on its
> > behalf, so that you wouldn't need to re-implement the server-side
> > authentication code in the external pooler. Things like that.
> 
> FWIW, I think that's not the right course. We should work towards an
> in-core pooler. There's very few postgres installations that don't need
> one, and there's a lot of things that are very hard to do without closer
> integration.

I tend to agree with this and things like trying to proxy authentication
are really not ideal, since it involves necessairly trusting another
system.  Perhaps it'd be nice to be able to proxy auth cleanly, and in
some cases it may be required to have another system involved (I've
certainly seen cases of multi-layered pgbouncer), but I'd rather only do
that when we need to instead of almost immediately...

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Built-in connection pooling

2018-04-19 Thread Andres Freund
On 2018-04-18 06:36:38 -0400, Heikki Linnakangas wrote:
> On 18/04/18 06:10, Konstantin Knizhnik wrote:
> > But there are still use cases which can not be covered y external
> > connection pooler.
> 
> Can you name some? I understand that the existing external connection
> poolers all have their limitations. But are there some fundamental issues
> that can *only* be addressed by a built-in implementation?
> 
> For the record, I think an internal connection pool might be a good idea. It
> would presumably be simpler to set up than an external one, for example. But
> it depends a lot on the implementation. If we had an internal connection
> pool, I would expect it to be very transparent to the user, be simple to set
> up, and not have annoying limitations with prepared statements, temporary
> tables, etc. that the existing external ones have.
> 
> However, I suspect that dealing with *all* of the issues is going to be hard
> and tedious. And if there are any significant gaps, things that don't work
> correctly with the pooler, the patch will almost certainly be rejected.
> 
> I'd recommend that you put your effort in improving the existing external
> connection poolers. Which one is closest to suit your needs? What's missing?
> 
> There are probably things we could do in the server, to help external
> connection poolers. For example, some kind of a proxy authentication, where
> the connection pooler could ask the backend to do authentication on its
> behalf, so that you wouldn't need to re-implement the server-side
> authentication code in the external pooler. Things like that.

FWIW, I think that's not the right course. We should work towards an
in-core pooler. There's very few postgres installations that don't need
one, and there's a lot of things that are very hard to do without closer
integration.

Greetings,

Andres Freund



Re: Built-in connection pooling

2018-04-19 Thread Dave Cramer
On Thu, Apr 19, 2018, 9:24 AM Konstantin Knizhnik, <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 19.04.2018 07:46, Tsunakawa, Takayuki wrote:
> > From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
> > Oracle, for example, you can create dedicated and non-dedicated backends.
> >> I wonder why we do not want to have something similar in Postgres.
> > Yes, I want it, too.  In addition to dedicated and shared server
> processes, Oracle provides Database Resident Connection Pooling (DRCP).  I
> guessed you were inspired by this.
> >
> >
> https://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc002.htm#ADMIN12348
>
> It seems to be that my connection pooling is more close to DRCP than to
> shared servers.
> It is not clear from this article what this 35KB per client connection
> are used for...
> It seems to be some thing similar with session context used to
> suspend/resume session.
> In my prototype I also maintain some per-session context to keep values
> of session specific GUCs, temporary namespace, ...
> Definitely pooled session memory footprint depends on size of catalog,
> prepared statements, updated GUCs,... but 10-100kb seems to be a
> reasonable estimation.
>
>
> >
> > BTW, you are doing various great work -- autoprepare, multithreaded
> Postgres, built-in connection pooling, etc. etc., aren't you?  Are you
> doing all of these alone?
> Yes, but there is huge distance from prototype till product-ready
> solution. And definitely I need some help here. This is why I have to
> suspend future development of multithreaded version of Postgres (looks
> like it is not considered as some realistic project by community).
> But with builtin connection pooling situation is better and I am going
> to tests it with some our clients which are interested in this feature.
>
>
> Konstantin



It would be useful to test with the JDBC driver

We run into issues with many pool implementations due to our opinionated
nature

Thanks

Dave

> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>


Re: Built-in connection pooling

2018-04-19 Thread Konstantin Knizhnik



On 19.04.2018 07:46, Tsunakawa, Takayuki wrote:

From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
Oracle, for example, you can create dedicated and non-dedicated backends.

I wonder why we do not want to have something similar in Postgres.

Yes, I want it, too.  In addition to dedicated and shared server processes, 
Oracle provides Database Resident Connection Pooling (DRCP).  I guessed you 
were inspired by this.

https://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc002.htm#ADMIN12348


It seems to be that my connection pooling is more close to DRCP than to 
shared servers.
It is not clear from this article what this 35KB per client connection 
are used for...
It seems to be some thing similar with session context used to 
suspend/resume session.
In my prototype I also maintain some per-session context to keep values 
of session specific GUCs, temporary namespace, ...
Definitely pooled session memory footprint depends on size of catalog, 
prepared statements, updated GUCs,... but 10-100kb seems to be a 
reasonable estimation.





BTW, you are doing various great work -- autoprepare, multithreaded Postgres, 
built-in connection pooling, etc. etc., aren't you?  Are you doing all of these 
alone?
Yes, but there is huge distance from prototype till product-ready 
solution. And definitely I need some help here. This is why I have to 
suspend future development of multithreaded version of Postgres (looks 
like it is not considered as some realistic project by community).
But with builtin connection pooling situation is better and I am going 
to tests it with some our clients which are interested in this feature.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




RE: Built-in connection pooling

2018-04-18 Thread Tsunakawa, Takayuki
From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
Oracle, for example, you can create dedicated and non-dedicated backends.
> I wonder why we do not want to have something similar in Postgres.

Yes, I want it, too.  In addition to dedicated and shared server processes, 
Oracle provides Database Resident Connection Pooling (DRCP).  I guessed you 
were inspired by this.

https://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc002.htm#ADMIN12348

BTW, you are doing various great work -- autoprepare, multithreaded Postgres, 
built-in connection pooling, etc. etc., aren't you?  Are you doing all of these 
alone?

Regards
Takayuki Tsunakawa





Re: Built-in connection pooling

2018-04-18 Thread Vladimir Borodin


> 18 апр. 2018 г., в 16:24, David Fetter  написал(а):
> 
> On Wed, Apr 18, 2018 at 02:52:39PM +0300, Konstantin Knizhnik wrote:
>> Yandex team is following this approach with theirOdysseus
>> (multithreaded version of pgbouncer with many of pgbouncer issues
>> fixed).
> 
> Have they opened the source to Odysseus?  If not, do they have plans to?

No, we haven't yet. Yep, we plan to do it until end on May.

> 
> Best,
> David.
> -- 
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
> 
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
> 

--
May the force be with you…
https://simply.name



Re: Built-in connection pooling

2018-04-18 Thread Konstantin Knizhnik



On 18.04.2018 16:41, Heikki Linnakangas wrote:

On 18/04/18 07:52, Konstantin Knizhnik wrote:



On 18.04.2018 13:36, Heikki Linnakangas wrote:

On 18/04/18 06:10, Konstantin Knizhnik wrote:

But there are still use cases which can not be covered y external
connection pooler.


Can you name some? I understand that the existing external connection
poolers all have their limitations. But are there some fundamental
issues that can *only* be addressed by a built-in implementation?


Well, may be I missed something, but i do not know how to efficiently
support
1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than 
session).


Me neither. What makes it easier to do these things in an internal 
connection pooler? What could the backend do differently, to make 
these easier to implement in an external pooler?


All this things are addressed now in my builtin connection pool 
implementation:
1. Temporary tables are maintained by creation of private temporary 
namespace for each session
2. Prepared statements are supported by adding unique session prefix to 
each prepared statement name (so there is single prepare statement cache 
in backend, but each session has its own prepared statements)
3.  Each session maintains list of updated GUCs and them are 
saved/restored on reschedule.


It was not so difficult to implement all this stuff (the main troubles I 
had with GUCs), but looks like none of them are possible fort external 
connection pooler.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-18 Thread Heikki Linnakangas

On 18/04/18 07:52, Konstantin Knizhnik wrote:



On 18.04.2018 13:36, Heikki Linnakangas wrote:

On 18/04/18 06:10, Konstantin Knizhnik wrote:

But there are still use cases which can not be covered y external
connection pooler.


Can you name some? I understand that the existing external connection
poolers all have their limitations. But are there some fundamental
issues that can *only* be addressed by a built-in implementation?


Well, may be I missed something, but i do not know how to efficiently
support
1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than session).


Me neither. What makes it easier to do these things in an internal 
connection pooler? What could the backend do differently, to make these 
easier to implement in an external pooler?


- Heikki



Re: Built-in connection pooling

2018-04-18 Thread Konstantin Knizhnik



On 18.04.2018 16:24, David Fetter wrote:

On Wed, Apr 18, 2018 at 02:52:39PM +0300, Konstantin Knizhnik wrote:

Yandex team is following this approach with theirOdysseus
(multithreaded version of pgbouncer with many of pgbouncer issues
fixed).

Have they opened the source to Odysseus?  If not, do they have plans to?


It is better to ask Valdimir Borodin (Yandex) about it.
But as far as I know - the answer is yes.
The Yandex policy is to make there products available for community.
I just wonder why it was not interested to community to know details of 
this project...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-18 Thread Konstantin Knizhnik



On 18.04.2018 16:09, Craig Ringer wrote:

On 18 April 2018 at 19:52, Konstantin Knizhnik
 wrote:


As far as I know most of DBMSes have some kind of internal connection
pooling.
Oracle, for example, you can create dedicated and non-dedicated backends.
I wonder why we do not want to have something similar in Postgres.

I want to, and I know many others to.

But the entire PostgreSQL architecture makes it hard to do well, and
means it requires heavy changes to do it in a way that will be
maintainable and reliable.

Making it work, and making something maintainble and mergeable, are
two different things. Something I continue to struggle with myself.


Here I completely agree with you.
Now my prototype "works": it is able to correctly handle errors, 
transaction rollbacks, long living transactions,... but I am completely 
sure that there are a lot of not tested cases when it will work 
incorrectly. But still I do not think that making built-in connection 
pooling really reliable is something unreachable.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooling

2018-04-18 Thread David Fetter
On Wed, Apr 18, 2018 at 02:52:39PM +0300, Konstantin Knizhnik wrote:
> Yandex team is following this approach with theirOdysseus
> (multithreaded version of pgbouncer with many of pgbouncer issues
> fixed).

Have they opened the source to Odysseus?  If not, do they have plans to?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Built-in connection pooling

2018-04-18 Thread Craig Ringer
On 18 April 2018 at 19:52, Konstantin Knizhnik
 wrote:

> As far as I know most of DBMSes have some kind of internal connection
> pooling.
> Oracle, for example, you can create dedicated and non-dedicated backends.
> I wonder why we do not want to have something similar in Postgres.

I want to, and I know many others to.

But the entire PostgreSQL architecture makes it hard to do well, and
means it requires heavy changes to do it in a way that will be
maintainable and reliable.

Making it work, and making something maintainble and mergeable, are
two different things. Something I continue to struggle with myself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Built-in connection pooling

2018-04-18 Thread Konstantin Knizhnik



On 18.04.2018 13:36, Heikki Linnakangas wrote:

On 18/04/18 06:10, Konstantin Knizhnik wrote:

But there are still use cases which can not be covered y external
connection pooler.


Can you name some? I understand that the existing external connection 
poolers all have their limitations. But are there some fundamental 
issues that can *only* be addressed by a built-in implementation?


Well, may be I missed something, but i do not know how to efficiently 
support

1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than session).

The problem with GUCs seems to be the easiest from this thee: we can 
just keep list of GUC assignments and prepend it to each statement. But 
it is not so efficient and can cause some problems (for example there 
are some statements, which can not be executed in multistatement context).


Prepared statement problem can be fixed either by implementing shared 
plan cache, either by autoprepare (I have proposed patch for it).


But  concerning temporary table I do not know any acceptable solution.



For the record, I think an internal connection pool might be a good 
idea. It would presumably be simpler to set up than an external one, 
for example. But it depends a lot on the implementation. If we had an 
internal connection pool, I would expect it to be very transparent to 
the user, be simple to set up, and not have annoying limitations with 
prepared statements, temporary tables, etc. that the existing external 
ones have.


However, I suspect that dealing with *all* of the issues is going to 
be hard and tedious. And if there are any significant gaps, things 
that don't work correctly with the pooler, the patch will almost 
certainly be rejected.


I'd recommend that you put your effort in improving the existing 
external connection poolers. Which one is closest to suit your needs? 
What's missing?


Yandex team is following this approach with theirOdysseus (multithreaded 
version of pgbouncer with many of pgbouncer issues fixed).
But it will not work for 1C which needs to keeps sessions (with 
temporary tables, e.t.c) for large number of clients which never closes 
connections.




There are probably things we could do in the server, to help external 
connection poolers. For example, some kind of a proxy authentication, 
where the connection pooler could ask the backend to do authentication 
on its behalf, so that you wouldn't need to re-implement the 
server-side authentication code in the external pooler. Things like that.


As far as I know most of DBMSes have some kind of internal connection 
pooling.

Oracle, for example, you can create dedicated and non-dedicated backends.
I wonder why we do not want to have something similar in Postgres.
Any external connection pooler will be less convenient for users than 
internal pooler.
It may be more flexible, more error protected, more scalable,  But 
still it is an extra entity which adds extra overhead and can also be 
bottleneck or SPoF.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-04-18 Thread Heikki Linnakangas

On 18/04/18 06:10, Konstantin Knizhnik wrote:

But there are still use cases which can not be covered y external
connection pooler.


Can you name some? I understand that the existing external connection 
poolers all have their limitations. But are there some fundamental 
issues that can *only* be addressed by a built-in implementation?


For the record, I think an internal connection pool might be a good 
idea. It would presumably be simpler to set up than an external one, for 
example. But it depends a lot on the implementation. If we had an 
internal connection pool, I would expect it to be very transparent to 
the user, be simple to set up, and not have annoying limitations with 
prepared statements, temporary tables, etc. that the existing external 
ones have.


However, I suspect that dealing with *all* of the issues is going to be 
hard and tedious. And if there are any significant gaps, things that 
don't work correctly with the pooler, the patch will almost certainly be 
rejected.


I'd recommend that you put your effort in improving the existing 
external connection poolers. Which one is closest to suit your needs? 
What's missing?


There are probably things we could do in the server, to help external 
connection poolers. For example, some kind of a proxy authentication, 
where the connection pooler could ask the backend to do authentication 
on its behalf, so that you wouldn't need to re-implement the server-side 
authentication code in the external pooler. Things like that.


- Heikki



Re: Built-in connection pooling

2018-04-18 Thread Konstantin Knizhnik

On 17.04.2018 20:09, Nikolay Samokhvalov wrote:

Understood.

One more question. Have you considered creation of pooling tool as a 
separate, not built-in tool, but being shipped with Postgres — like 
psql is shipped in packages usually called “postgresql-client-XX” 
which makes psql the default tool to work in terminal? I constantly 
hear opinion from various users, that Postgres needs 
“default”/official pooling tool.


There were a lot of discussions in hackers and in other mailing 
lists/forums concerning PostgreSQL and connection pooling.
From  the point of view of many PostgreSQL users which I know myself, 
lack of standard (built-in?) connection pooling is one of the main 
drawbacks of PostgreSQL.

Right now  we have pgbouncer which is small, fast and reliable but
- Doesn't allow you to use prepared statements, temporary table and 
session variables.
- Is single threaded, so becomes bottleneck for large (>100) number of 
active connections

- Can not be used for load balancing for hot standby replicas

So if you have a lot of active connections, you will have to setup pool 
of pgbouncers.
There is also pgpool  which supports load balancing, but doesn't perform 
session pooling. So it has to be used together with pgbouncer.
So to be able to use Postgres in enterprise system you will have to 
setup very complex pipeline of different tools.


Definitely we need some standard solution for it. As far as I know, 
Yandex is now working on their own version of external connection pooler 
which can eliminate single-threaded limitation of pgbouncer. 
Unfortunately their presentation was not accepted for pgconf (as well as 
my presentation about built-in connection pooling).


External connection pooler definitely provides more flexibility than 
built-in connection pooler. It can be installed either at client side, 
either at server side, either somewhere between them.
Alos it is more reliable, because it changes nothing in Postgres 
architecture.
But there are still use cases which can not be covered y external 
connection pooler.
1C company (Russian SAP) at presentation at PgConf.ru 2018 mentioned 
that lack of internal pooling is the main limitationg factor for 
replacing MS-SQL with Postgres.
They have a lot of clients which never close connections. And they need 
persistent session because of wide use of temporary tables.
This is why 1C can not use pgbouncer. We now try to provide to them 
prototype version of Postgres with builtin connection pool.
If results of such experiments will be successful, we will propose this 
connection pooler to community (but it available right now, so anybody 
who want can test it).



вт, 17 апр. 2018 г. в 0:44, Konstantin Knizhnik 
>:




On 13.04.2018 19:07, Nikolay Samokhvalov wrote:

On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik
>
wrote:

Development in built-in connection pooling will be continued
in https://github.com/postgrespro/postgresql.builtin_pool.git
I am not going to send new patches to hackers mailing list
any more.


Why?


Just do not want to spam hackers with a lot of patches.
Also since I received few feedbacks in this thread, I consider
that this topic is not so interesting for community.

Please notice that built-in connection pool is conn_pool branch.


-- 
Konstantin Knizhnik

Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-04-17 Thread Nikolay Samokhvalov
Understood.

One more question. Have you considered creation of pooling tool as a
separate, not built-in tool, but being shipped with Postgres — like psql is
shipped in packages usually called “postgresql-client-XX” which makes psql
the default tool to work in terminal? I constantly hear opinion from
various users, that Postgres needs “default”/official pooling tool.

вт, 17 апр. 2018 г. в 0:44, Konstantin Knizhnik :

>
>
> On 13.04.2018 19:07, Nikolay Samokhvalov wrote:
>
> On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>>
>> Development in built-in connection pooling will be continued in
>> https://github.com/postgrespro/postgresql.builtin_pool.git
>> I am not going to send new patches to hackers mailing list any more.
>>
>
> Why?
>
>
> Just do not want to spam hackers with a lot of patches.
> Also since I received few feedbacks in this thread, I consider that this
> topic is not so interesting for community.
>
> Please notice that built-in connection pool is conn_pool branch.
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: Built-in connection pooling

2018-04-17 Thread Konstantin Knizhnik



On 13.04.2018 19:07, Nikolay Samokhvalov wrote:
On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik 
> wrote:


Development in built-in connection pooling will be continued in
https://github.com/postgrespro/postgresql.builtin_pool.git

I am not going to send new patches to hackers mailing list any more.


Why?


Just do not want to spam hackers with a lot of patches.
Also since I received few feedbacks in this thread, I consider that this 
topic is not so interesting for community.


Please notice that built-in connection pool is conn_pool branch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-04-13 Thread Nikolay Samokhvalov
On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:
>
> Development in built-in connection pooling will be continued in
> https://github.com/postgrespro/postgresql.builtin_pool.git
> I am not going to send new patches to hackers mailing list any more.
>

Why?


Re: Built-in connection pooling

2018-04-13 Thread Konstantin Knizhnik



On 06.04.2018 20:03, Konstantin Knizhnik wrote:



On 06.04.2018 20:00, Konstantin Knizhnik wrote:
Attached please find new version of the patch with  several bug fixes 
+ support of more than one session pools associated with different 
ports.
Now it is possible to make postmaster listen several ports for 
accepting pooled connections, while leaving main Postgres port for 
dedicated backends.
Each session pool is intended to be used for particular database/user 
combination.



Sorry, wrong patch was attached.



Development in built-in connection pooling will be continued in 
https://github.com/postgrespro/postgresql.builtin_pool.git

I am not going to send new patches to hackers mailing list any more.
The last added feature is support of idle_in_transaction_session_timeout 
which is especially critical for builtin pool with transaction-level 
scheduling because long transaction can block other sessions executed at 
this backend.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-04-06 Thread Konstantin Knizhnik



On 06.04.2018 20:00, Konstantin Knizhnik wrote:
Attached please find new version of the patch with  several bug fixes 
+ support of more than one session pools associated with different ports.
Now it is possible to make postmaster listen several ports for 
accepting pooled connections, while leaving main Postgres port for 
dedicated backends.
Each session pool is intended to be used for particular database/user 
combination.



Sorry, wrong patch was attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 93c4bbf..dfc072c 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -194,6 +194,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -441,9 +442,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -452,9 +451,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -463,8 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2902,9 +2898,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2967,9 +2961,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -2982,8 +2974,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3250,8 +3241,11 @@ void
 SetTempNamespaceState(Oid tempNamespaceId, Oid tempToastNamespaceId)
 {
 	/* Worker should not have created its own namespaces ... */
-	Assert(myTempNamespace == InvalidOid);
-	Assert(myTempToastNamespace == InvalidOid);
+	if (!ActiveSession)
+	{
+		Assert(myTempNamespace == InvalidOid);
+		Assert(myTempToastNamespace == InvalidOid);
+	}
 	Assert(myTempNamespaceSubID == InvalidSubTransactionId);
 
 	/* Assign same namespace OIDs that leader has */
@@ -3771,6 +3765,22 @@ recomputeNamespacePath(void)
 	list_free(oidlist);
 }
 
+static Oid
+GetTempTableNamespace(void)
+{
+	if (ActiveSession)
+	{
+		if (!OidIsValid(ActiveSession->tempNamespace))
+			InitTempTableNamespace();
+	}
+	else
+	{
+		if (!OidIsValid(myTempNamespace))
+			InitTempTableNamespace();
+	}
+	return myTempNamespace;
+}
+
 /*
  * InitTempTableNamespace
  *		Initialize temp table namespace on first use in a particular backend
@@ -3782,8 +3792,6 @@ InitTempTableNamespace(void)
 	Oid			namespaceId;
 	Oid			toastspaceId;
 
-	Assert(!OidIsValid(myTempNamespace));
-
 	/*
 	 * First, do permission check to see if we are authorized to make temp
 	 * tables.  We use a nonstandard error message here since "databasename:
@@ -3822,7 +3830,10 @@ InitTempTableNamespace(void)
 (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
  errmsg("cannot create temporary tables during a parallel operation")));
 
-	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+	if (ActiveSession)
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d_%s", MyBackendId, ActiveSession->id);
+	else

Re: Built-in connection pooling

2018-04-06 Thread Konstantin Knizhnik
Attached please find new version of the patch with  several bug fixes + 
support of more than one session pools associated with different ports.
Now it is possible to make postmaster listen several ports for accepting 
pooled connections, while leaving main Postgres port for dedicated backends.
Each session pool is intended to be used for particular database/user 
combination.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 93c4bbf..dfc072c 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -194,6 +194,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -441,9 +442,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -452,9 +451,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -463,8 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2902,9 +2898,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2967,9 +2961,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -2982,8 +2974,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3250,8 +3241,11 @@ void
 SetTempNamespaceState(Oid tempNamespaceId, Oid tempToastNamespaceId)
 {
 	/* Worker should not have created its own namespaces ... */
-	Assert(myTempNamespace == InvalidOid);
-	Assert(myTempToastNamespace == InvalidOid);
+	if (!ActiveSession)
+	{
+		Assert(myTempNamespace == InvalidOid);
+		Assert(myTempToastNamespace == InvalidOid);
+	}
 	Assert(myTempNamespaceSubID == InvalidSubTransactionId);
 
 	/* Assign same namespace OIDs that leader has */
@@ -3771,6 +3765,22 @@ recomputeNamespacePath(void)
 	list_free(oidlist);
 }
 
+static Oid
+GetTempTableNamespace(void)
+{
+	if (ActiveSession)
+	{
+		if (!OidIsValid(ActiveSession->tempNamespace))
+			InitTempTableNamespace();
+	}
+	else
+	{
+		if (!OidIsValid(myTempNamespace))
+			InitTempTableNamespace();
+	}
+	return myTempNamespace;
+}
+
 /*
  * InitTempTableNamespace
  *		Initialize temp table namespace on first use in a particular backend
@@ -3782,8 +3792,6 @@ InitTempTableNamespace(void)
 	Oid			namespaceId;
 	Oid			toastspaceId;
 
-	Assert(!OidIsValid(myTempNamespace));
-
 	/*
 	 * First, do permission check to see if we are authorized to make temp
 	 * tables.  We use a nonstandard error message here since "databasename:
@@ -3822,7 +3830,10 @@ InitTempTableNamespace(void)
 (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
  errmsg("cannot create temporary tables during a parallel operation")));
 
-	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+	if (ActiveSession)
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d_%s", MyBackendId, ActiveSession->id);
+	else
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
 
 	

Re: Built-in connection pooling

2018-02-09 Thread Ryan Pedela
On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky  wrote:

> Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
> like to go back to the basic on this.
>
> Unless I'm mistaken, at least in the Java and .NET world, clients are
> almost always expected to have their own connection pooling, either
> implemented inside the driver (ADO.NET model) or as a separate modular
> component (JDBC). This approach has a few performance advantages:
>
> 1. "Opening" a new pooled connection is virtually free - no TCP connection
> needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
> synchronization).
> 2. Important client state can be associated to physical connections. For
> example, prepared statements can be tracked on the physical connection, and
> persisted when the connection is returned to the pool. The next time the
> physical connection is returned from the pool, if the user tries to
> server-prepare a statement, we can check on the connection if it has
> already been prepared in a "previous lifetime", and if so, no need to
> prepare again. This is vital for scenarios with short-lived (pooled)
> connections, such as web. Npgsql does this.
>
> Regarding the problem of idle connections being kept open by clients, I'd
> argue it's a client-side problem. If the client is using a connection pool,
> the pool should be configurable to close idle connections after a certain
> time (I think this is relatively standard behavior). If the client isn't
> using a pool, it seems to be the application's responsibility to release
> connections when they're no longer needed.
>
> The one drawback is that the pooling is application-specific, so it can't
> be shared by multiple applications/hosts. So in some scenarios it may make
> sense to use both client pooling and proxy/server pooling.
>
> To sum it up, I would argue that connection pooling should first and
> foremost be considered as a client feature, rather than a proxy feature
> (pgpool) or server feature (the PostgreSQL pooling being discussed here).
> This isn't to say server-side pooling has no value though.
>

Recently, I did a large amount of parallel data processing where the
results were stored in PG. I had about 1000 workers each with their own PG
connection. As you pointed out, application pooling doesn't make sense in
this scenario. I tried pgpool and pgbouncer, and both ended up as the
bottleneck. Overall throughput was not great but it was highest without a
pooler. That aligns with Konstantin's benchmarks too. As far as I know,
server pooling is the only solution to increase throughput, without
upgrading hardware, for this use case.

I hope this PR gets accepted!


Re: Built-in connection pooling

2018-02-09 Thread Shay Rojansky
Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
like to go back to the basic on this.

Unless I'm mistaken, at least in the Java and .NET world, clients are
almost always expected to have their own connection pooling, either
implemented inside the driver (ADO.NET model) or as a separate modular
component (JDBC). This approach has a few performance advantages:

1. "Opening" a new pooled connection is virtually free - no TCP connection
needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
synchronization).
2. Important client state can be associated to physical connections. For
example, prepared statements can be tracked on the physical connection, and
persisted when the connection is returned to the pool. The next time the
physical connection is returned from the pool, if the user tries to
server-prepare a statement, we can check on the connection if it has
already been prepared in a "previous lifetime", and if so, no need to
prepare again. This is vital for scenarios with short-lived (pooled)
connections, such as web. Npgsql does this.

Regarding the problem of idle connections being kept open by clients, I'd
argue it's a client-side problem. If the client is using a connection pool,
the pool should be configurable to close idle connections after a certain
time (I think this is relatively standard behavior). If the client isn't
using a pool, it seems to be the application's responsibility to release
connections when they're no longer needed.

The one drawback is that the pooling is application-specific, so it can't
be shared by multiple applications/hosts. So in some scenarios it may make
sense to use both client pooling and proxy/server pooling.

To sum it up, I would argue that connection pooling should first and
foremost be considered as a client feature, rather than a proxy feature
(pgpool) or server feature (the PostgreSQL pooling being discussed here).
This isn't to say server-side pooling has no value though.


Re: Built-in connection pooling

2018-02-09 Thread Konstantin Knizhnik
Attached please find new version of built-in connection pooling 
supporting temporary tables and session GUCs.

Also Win32 support was added.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 93c4bbf..dfc072c 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -194,6 +194,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -441,9 +442,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -452,9 +451,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -463,8 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2902,9 +2898,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2967,9 +2961,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -2982,8 +2974,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3250,8 +3241,11 @@ void
 SetTempNamespaceState(Oid tempNamespaceId, Oid tempToastNamespaceId)
 {
 	/* Worker should not have created its own namespaces ... */
-	Assert(myTempNamespace == InvalidOid);
-	Assert(myTempToastNamespace == InvalidOid);
+	if (!ActiveSession)
+	{
+		Assert(myTempNamespace == InvalidOid);
+		Assert(myTempToastNamespace == InvalidOid);
+	}
 	Assert(myTempNamespaceSubID == InvalidSubTransactionId);
 
 	/* Assign same namespace OIDs that leader has */
@@ -3771,6 +3765,22 @@ recomputeNamespacePath(void)
 	list_free(oidlist);
 }
 
+static Oid
+GetTempTableNamespace(void)
+{
+	if (ActiveSession)
+	{
+		if (!OidIsValid(ActiveSession->tempNamespace))
+			InitTempTableNamespace();
+	}
+	else
+	{
+		if (!OidIsValid(myTempNamespace))
+			InitTempTableNamespace();
+	}
+	return myTempNamespace;
+}
+
 /*
  * InitTempTableNamespace
  *		Initialize temp table namespace on first use in a particular backend
@@ -3782,8 +3792,6 @@ InitTempTableNamespace(void)
 	Oid			namespaceId;
 	Oid			toastspaceId;
 
-	Assert(!OidIsValid(myTempNamespace));
-
 	/*
 	 * First, do permission check to see if we are authorized to make temp
 	 * tables.  We use a nonstandard error message here since "databasename:
@@ -3822,7 +3830,10 @@ InitTempTableNamespace(void)
 (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
  errmsg("cannot create temporary tables during a parallel operation")));
 
-	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+	if (ActiveSession)
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d_%s", MyBackendId, ActiveSession->id);
+	else
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
 
 	namespaceId = get_namespace_oid(namespaceName, true);
 	if (!OidIsValid(namespaceId))
@@ -3854,8 +3865,10 @@ InitTempTableNamespace(void)
 	 * it. (We assume there is no need to clean it out if it does exist, since
 	 * dropping a parent 

Re: Built-in connection pooling

2018-02-02 Thread Vladimir Sitnikov
Konstantin>I do not have explanation of performance degradation in case of
this
particular workload.

A) Mongo Java Client uses a connection-pool of 100 connections by default.
That is it does not follow "connection per client" (in YCSB terms), but it
is capped by 100 connections. I think it can be adjusted by adding
?maxPoolSize=100500 or ?maxpoolsize=100500 to the Mongo URL

I wonder if you could try to vary that parameter and see if it changes
Mongo results.

B) There's a bug in JDBC client of YCSB (it might affect PostgreSQL
results, however I'm not sure if the impact would be noticeable). The
default configuration is readallfields=true, however Jdbc client just
discards the results instead of accessing the columns. I've filed
https://github.com/brianfrankcooper/YCSB/issues/1087 for that.


C) I might miss something, however my local (Macbook) benchmarks show that
PostgreSQL 9.6 somehow uses Limit->Sort->BitmapScan kind of plans.
I have picked a "bad" userid value via auto_explain.
Jdbc client uses prepared statements, so a single bind might spoil the
whole thing causing bad plans for all the values afterwards.
Does it make sense to disable bitmap scan somehow?

For instance:

explain (analyze, buffers) select * From usertable where
YCSB_KEY>='user884845140610037639' order by YCSB_KEY limit 100;
  QUERY PLAN
---
 Limit  (cost=320.99..321.24 rows=100 width=1033) (actual time=1.408..1.429
rows=100 loops=1)
   Buffers: shared hit=140
   ->  Sort  (cost=320.99..321.33 rows=135 width=1033) (actual
time=1.407..1.419 rows=100 loops=1)
 Sort Key: ycsb_key
 Sort Method: quicksort  Memory: 361kB
 Buffers: shared hit=140
 ->  Bitmap Heap Scan on usertable  (cost=9.33..316.22 rows=135
width=1033) (actual time=0.186..0.285 rows=167 loops=1)
   Recheck Cond: ((ycsb_key)::text >=
'user884845140610037639'::text)
   Heap Blocks: exact=137
   Buffers: shared hit=140
   ->  Bitmap Index Scan on usertable_pkey  (cost=0.00..9.29
rows=135 width=0) (actual time=0.172..0.172 rows=167 loops=1)
 Index Cond: ((ycsb_key)::text >=
'user884845140610037639'::text)
 Buffers: shared hit=3
 Planning time: 0.099 ms
 Execution time: 1.460 ms

vs

explain (analyze, buffers) select * From usertable where
YCSB_KEY>='user184845140610037639' order by YCSB_KEY limit 100;
QUERY PLAN
---
 Limit  (cost=0.28..89.12 rows=100 width=1033) (actual time=0.174..0.257
rows=100 loops=1)
   Buffers: shared hit=102
   ->  Index Scan using usertable_pkey on usertable  (cost=0.28..2154.59
rows=2425 width=1033) (actual time=0.173..0.246 rows=100 loops=1)
 Index Cond: ((ycsb_key)::text >= 'user184845140610037639'::text)
 Buffers: shared hit=102
 Planning time: 0.105 ms
 Execution time: 0.277 ms

Vladimir


Re: Built-in connection pooling

2018-02-02 Thread Konstantin Knizhnik



On 01.02.2018 23:28, Vladimir Sitnikov wrote:

> config/pgjsonb-local.dat

Do you use standard "workload" configuration values? 
(e.g. recordcount=1000, maxscanlength=100)


Yes, I used default value for workload. For example, workload-A has the 
following settings:


# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0

requestdistribution=zipfian




Could you share ycsb output (e.g. for workload a)?
I mean lines like
[TOTAL_GC_TIME], Time(ms), xxx
[TOTAL_GC_TIME_%], Time(%), xxx


$ cat results/last/run_pgjsonb-local_workloada_70_bt.out
[OVERALL], RunTime(ms), 60099.0
[OVERALL], Throughput(ops/sec), 50444.83269272367
[TOTAL_GCS_PS_Scavenge], Count, 6.0
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 70.0
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.11647448376844872
[TOTAL_GCS_PS_MarkSweep], Count, 0.0
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0.0
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0
[TOTAL_GCs], Count, 6.0
[TOTAL_GC_TIME], Time(ms), 70.0
[TOTAL_GC_TIME_%], Time(%), 0.11647448376844872
[READ], Operations, 1516174.0
[READ], AverageLatency(us), 135.802076146933
[READ], MinLatency(us), 57.0
[READ], MaxLatency(us), 23327.0
[READ], 95thPercentileLatency(us), 382.0
[READ], 99thPercentileLatency(us), 828.0
[READ], Return=OK, 1516174
[CLEANUP], Operations, 70.0
[CLEANUP], AverageLatency(us), 134.21428571428572
[CLEANUP], MinLatency(us), 55.0
[CLEANUP], MaxLatency(us), 753.0
[CLEANUP], 95thPercentileLatency(us), 728.0
[CLEANUP], 99thPercentileLatency(us), 750.0
[UPDATE], Operations, 1515510.0
[UPDATE], AverageLatency(us), 2622.6653258639008
[UPDATE], MinLatency(us), 86.0
[UPDATE], MaxLatency(us), 1059839.0
[UPDATE], 95thPercentileLatency(us), 1261.0
[UPDATE], 99thPercentileLatency(us), 87039.0
[UPDATE], Return=OK, 1515510


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-02-01 Thread Konstantin Knizhnik



On 01.02.2018 16:33, Vladimir Sitnikov wrote:

Konstantin>I have not built YCSB myself, use existed installation.

Which pgjdbc version was in use?


postgresql-9.4.1212.jar

Konstantin>One of the main problems of Postgres is significant degrade 
of performance in case of concurrent write access by multiple 
transactions to the same sows.


I would consider that a workload "problem" rather than PostgreSQL problem.
That is, if an application (e.g. YCSB) is trying to update the same 
rows in multiple transactions concurrently, then the outcome of such 
updates is likely to be unpredictable. Does it make sense?


I can't agree with you.
Yes, there are workloads where updates are more or less local: clients 
are used to update their own private data.
But there are many systems  with "shared" resources which are 
concurrently accessed by different users. They may just increment access 
count or perform deposit/withdraw...
Just simple example: consider that you have something like AppStore and 
there is some popular application which is bought by a lot of users.
From DBMS point of view a lot of clients perform concurrent update of 
the same record.
So performance on such workload is also very important. And 
unfortunately here Postgres loses to the competition with mySQL and most 
of other DBMSes.




At least, I do not see why Mongo would degrade in a different way 
there. Oleg's charts suggest that Mongo does not degrade there, so I 
wonder if we compare apples to apples in the first place.


Postgres locks tuples in very inefficient way in case of high contention.
It first lock buffer and checks if tuple is locked by some other backend.
Then it tries to set heavy weight lock on the tuple's tcid. If there are 
several processes trying update this tuple, then all of them will be 
queued on this heavy-weight tuple lock.
After getting this tuple lock, backend tries to lock tid of transaction 
which updated the tuple.
Once transaction updated this tuple is completed, Postgres unblocks 
backends waiting for this transaction. It checks status of the tuple and 
release tuple's lock, awaken one of waiting clients.
As far as Postgres  using MVCC, it creates new version of the tuple on 
each update.
So the tuple all clients are waiting for is not the last version of of 
the tuple any more.
Depending on isolation policy them either need to report error (in case 
of repeatable read) or update snapshot and repeat search with new 
snapshot...

and perform all checks and locks mentioned above once again.

I hope that it is clear from this brief and not so precise explanation 
that Postgres has to do a lot of redundant work if several client are 
competing for the same tuple.
There is well known rule that pessimistic locking is more efficient than 
optimistic in case of high contention.
So Postgres can provide better performance on this workload if it be 
more pessimistic:
set lock not on TCID (identifier of particular tuple version), but on 
tuple's PK (primary key) and hold it till end of the transaction 
(because until transaction is completed nobody still be
able to update this tuple). This trick with locking PK really helps to 
improve performance on this workload, but unfortunately can not reverse 
the trend with the degradation of performance with increasing number of 
competing transactions.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-02-01 Thread Vladimir Sitnikov
Konstantin>I have not built YCSB myself, use existed installation.

Which pgjdbc version was in use?

Konstantin>One of the main problems of Postgres is significant degrade of
performance in case of concurrent write access by multiple transactions to
the same sows.

I would consider that a workload "problem" rather than PostgreSQL problem.
That is, if an application (e.g. YCSB) is trying to update the same rows in
multiple transactions concurrently, then the outcome of such updates is
likely to be unpredictable. Does it make sense?

At least, I do not see why Mongo would degrade in a different way there.
Oleg's charts suggest that Mongo does not degrade there, so I wonder if we
compare apples to apples in the first place.

Vladimir


Re: Built-in connection pooling

2018-02-01 Thread Konstantin Knizhnik



On 01.02.2018 15:21, Vladimir Sitnikov wrote:
Konstantin>I have obtained more results with YCSB benchmark and 
built-in connection pooling


Could you provide more information on the benchmark setup you have used?
For instance: benchmark library versions, PostgreSQL client version, 
additional/default benchmark parameters.




I am using the latest Postgres sources with applied connection pooling 
patch.

I have not built YCSB myself, use existed installation.

To launch tests I used the following YCSB command line:

To load data:
YCSB_MAXRUNTIME=60 YCSB_OPS=10 YCSB_DBS="pgjsonb-local" 
YCSB_CFG="bt" YCSB_CLIENTS="250" YCSB_WORKLOADS="load_a" ./ycsb.sh


To run test:
YCSB_MAXRUNTIME=60 YCSB_OPS=10 YCSB_DBS="pgjsonb-local" 
YCSB_CFG="bt" YCSB_CLIENTS="250 500 750 1000" YCSB_WORKLOADS="run_a" 
./ycsb.sh


$ cat config/pgjsonb-local.dat
db.driver=org.postgresql.Driver
db.url=jdbc:postgresql://localhost:5432/ycsb
db.user=ycsb
db.passwd=ycsb
db.batchsize=100
jdbc.batchupdateapi=true
table=usertable

Konstantin>Postgres shows significant slow down with increasing number 
of connections in case of conflicting updates.

Konstantin>Built-in connection pooling can somehow eliminate this problem

Can you please clarify how connection pooling eliminates slow down?
Is the case as follows?
1) The application updates multiple of rows in a single transaction
2) There are multiple concurrent threads
3) The threads update the same rows at the same time

If that is the case, then the actual workload is different each time 
you vary connection pool size.

For instance, if you use 1 thread, then the writes become uncontended.

Of course, you might use just it as a "black box" workload, however I 
wonder if that kind of workload ever appears in a real-life 
applications. I would expect for the applications to update the same 
row multiple times, however I would expect the app is doing subsequent 
updates, not the concurrent ones.


On the other hand, as you vary the pool size, the workload varies as 
well (the resulting database contents is different), so it looks like 
comparing apples to oranges.


Vladimir


Sorry, I am not sure that I completely understand your question.
YCSB (Yahoo! Cloud Serving Benchmark) framework is essentially 
multiclient benchmark which assumes larger number concurrent requests to 
the database.
Requests themselves are used to be very simple (benchmark emulates 
key-vlaue storage).

In my tests I perform measurements for 250, 500, 750 and 1000 connections.

One of the main problems of Postgres is significant degrade of 
performance in case of concurrent write access by multiple transactions 
to the same sows.
This is why performance of pgbench and YCSB benchmark significantly 
(more then linear) degrades with increasing number of client connections 
especially in case o Zipf distribution

(significantly increasing possibility of conflict).

Connection pooling allows to fix number of backends and serve almost any 
number of connections using fixed size of backends.

So results are almost the same for 250, 500, 750 and 1000 connections.
The problem is choosing optimal number of backends.

For readonly pgbench best results are achieved for 300 backends, for 
YCSB with 5% of updates - for 70 backends, for YCSB with 50% of updates 
- for 30 backends.

So something definitely need to be changes in Postgres locking mechanism.

Connection pooling allows to minimize contention on resource and degrade 
of performance caused by such contention.
But unfortunately it is not a silver bullet fixing all Postgres 
scalability problems.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-02-01 Thread Vladimir Sitnikov
Konstantin>I have obtained more results with YCSB benchmark and built-in
connection pooling

Could you provide more information on the benchmark setup you have used?
For instance: benchmark library versions, PostgreSQL client version,
additional/default benchmark parameters.

Konstantin>Postgres shows significant slow down with increasing number of
connections in case of conflicting updates.
Konstantin>Built-in connection pooling can somehow eliminate this problem

Can you please clarify how connection pooling eliminates slow down?
Is the case as follows?
1) The application updates multiple of rows in a single transaction
2) There are multiple concurrent threads
3) The threads update the same rows at the same time

If that is the case, then the actual workload is different each time you
vary connection pool size.
For instance, if you use 1 thread, then the writes become uncontended.

Of course, you might use just it as a "black box" workload, however I
wonder if that kind of workload ever appears in a real-life applications. I
would expect for the applications to update the same row multiple times,
however I would expect the app is doing subsequent updates, not the
concurrent ones.

On the other hand, as you vary the pool size, the workload varies as well
(the resulting database contents is different), so it looks like comparing
apples to oranges.

Vladimir

>


Re: Built-in connection pooling

2018-02-01 Thread Konstantin Knizhnik
I have obtained more results with YCSB benchmark and built-in connection 
pooling.
Explanation of the benchmark and all results for vanilla Postgres and 
Mongo are available in Oleg Bartunov presentation about  JSON (at the 
end of presentation):


http://www.sai.msu.su/~megera/postgres/talks/sqljson-pgconf.eu-2017.pdf

as you can see, Postgres shows significant slow down with increasing 
number of connections in case of conflicting updates.

Built-in connection pooling can somehow eliminate this problem:

Workload-B (5% of updates) ops/sec:


Session pool size/clients
250
500
750
1000
0
151511
78078
48742
30186
32
522347
543863
546971
540462
64
736323
770010
763649
763358
128
245167
241377
243322
232482
256
144964
146723
149317
141049



Here the maximum is obtained near 70 backends which corresponds to the 
number of physical cores at the target system.



But for workload A (50% of updates), optimum is achieved at much smaller 
number of backends, after which we get very fast performance degradation:



Session pool size
kops/sec
16
220
30
353
32
362
40
120
70
53
256
20




Here the maximum is reached at 32 backends and with 70 backends 
performance is 6 times worser.
It means that it is difficult to find optimal size of session pool if we 
have varying workload.
If we set it too large, then we get high contention of conflicting 
update queries, if it is too small, then we do not utilize all system 
resource on read-only or not conflicting queries.


Look like we have to do something with Postgres locking mechanism and 
may be implement some contention aware scheduler as described here:


http://www.vldb.org/pvldb/vol11/p648-tian.pdf

But this is a different story, not related to built-in connection pooling.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Built-in connection pooling

2018-01-29 Thread Vladimir Sitnikov
Bruce>Well, we could have the connection pooler disconnect those, right?

I agree. Do you think we could rely on all the applications being
configured in a sane way?
A fallback configuration at DB level could still be useful to ensure the DB
keeps running in case multiple applications access it. It might be
non-trivial to ensure proper configurations across all the apps.

What I do like is the behaviour of dropping connections should already be
considered by existing applications, so it should fit naturally to the
existing apps.

Alternative approach might be to dump to disk relevant resources for
inactive sessions, so the session could be recreated in case the connection
is requested again after a long pause (e.g. reprepare all the statements),
however it sounds scary.

Vladimir


Re: Built-in connection pooling

2018-01-29 Thread Bruce Momjian
On Mon, Jan 29, 2018 at 04:02:22PM +, Vladimir Sitnikov wrote:
> Bruce>Yes, it would impact applications and you are right most applications
> could not handle that cleanly.
> 
> I would disagree here.
> We are discussing applications that produce "lots of idle" connections, aren't
> we? That typically comes from an application-level connection pool.
> Most of the connection pools have a setting that would "validate" connection 
> in
> case it was not used for a certain period of time.
> 
> That plays nicely in case server drops "idle, not in a transaction" 
> connection.

Well, we could have the connection pooler disconnect those, right?

> Of course, there are cases when application just grabs a connection from a 
> pool
> and uses it in a non-transacted way (e.g. does some action once an hour and
> commits immediately). However that kind of application would already face
> firewalls, etc. I mean the application should already be prepared to handle
> "network issues".
> 
> Bruce> It is probably better to look into
> Bruce>freeing resources for idle connections instead and keep the socket open.
> 
> The application might expect for the session-specific data to be present, so 
> it
> might be even worse if the database deallocates all the things but TCP
> connection.
> 
> For instance: application might expect for the server-prepared statements to 
> be
> there. Would you deallocate server-prepared statements for those "idle"
> connections? The app would just break. There's no way (currently) for the
> application to know that the statement expired unexpectedly.

I don't know what we would deallocate yet.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-01-29 Thread Vladimir Sitnikov
Bruce>Yes, it would impact applications and you are right most applications
could not handle that cleanly.

I would disagree here.
We are discussing applications that produce "lots of idle" connections,
aren't we? That typically comes from an application-level connection pool.
Most of the connection pools have a setting that would "validate"
connection in case it was not used for a certain period of time.

That plays nicely in case server drops "idle, not in a transaction"
connection.

Of course, there are cases when application just grabs a connection from a
pool and uses it in a non-transacted way (e.g. does some action once an
hour and commits immediately). However that kind of application would
already face firewalls, etc. I mean the application should already be
prepared to handle "network issues".

Bruce> It is probably better to look into
Bruce>freeing resources for idle connections instead and keep the socket
open.

The application might expect for the session-specific data to be present,
so it might be even worse if the database deallocates all the things but
TCP connection.

For instance: application might expect for the server-prepared statements
to be there. Would you deallocate server-prepared statements for those
"idle" connections? The app would just break. There's no way (currently)
for the application to know that the statement expired unexpectedly.

Vladimir


Re: Built-in connection pooling

2018-01-29 Thread Bruce Momjian
On Mon, Jan 29, 2018 at 11:57:36AM +0300, Konstantin Knizhnik wrote:
> Right now, if you hit max_connections, we start rejecting new
> connections.  Would it make sense to allow an option to exit idle
> connections when this happens so new users can connect?
> 
> It will require changes in client applications, will not it? Them should be
> ready that connection can be dropped by server at any moment of time.
> I do not know it is possible to drop idle connection and hide this fact from
> the client. In my implementation each session keeps minimal necessary
> information requires for interaction with client (session context).  It
> includes socket, struct Port and session memory context which should be used
> instead of TopMemoryContext for session specific data.

Yes, it would impact applications and you are right most applications
could not handle that cleanly.  It is probably better to look into
freeing resources for idle connections instead and keep the socket open.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-01-29 Thread Konstantin Knizhnik



On 28.01.2018 03:40, Bruce Momjian wrote:

On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:

Yes, external connection pooling is more flexible. It allows to
perform pooling either at client side either at server side (or even
combine two approaches).>
Also external connection pooling for PostgreSQL is not limited by
pgbouncer/pgpool.>
There are many frameworks maintaining their own connection pool, for
example J2EE, jboss, hibernate,...>
I have a filling than about 70% of enterprise systems working with
databases are written in Java and doing connection pooling in their
own way.>

True, but that does not really mean we don't need "our" connection
pooling (built-in or not). The connection pools are usually built into
the application servers, so each application server has their own
independent pool. With larger deployments (a couple of application
servers) that quickly causes problems with max_connections.

I found this thread and the pthread thread very interesting.  Konstantin,
thank you for writing prototypes and giving us very useful benchmarks
for ideas I thought I might never see.

As much as I would like to move forward with coding, I would like to
back up and understand where we need to go with these ideas.

First, it looks like pthreads and a builtin pooler help mostly with
1000+ connections.  It seems like you found that pthreads wasn't
sufficient and the builtin pooler was better.  Is that correct?


Brief answer is yes.
Pthreads allows to minimize per-connection overhead and make it possible 
to obtain better results for large number of connections.
But there is a principle problem: Postgres connection is "heave weight"  
object: each connection maintains it own private cache of catalog, 
relations, temporary
table pages, prepared statements,... So even through pthreads allows to 
minimize per-connection memory usage, it is negligible comparing with 
all this connection
private memory resources.  It means that we still need to use connection 
pooling.


Pthreads provides two main advantages:
1. Simplify interaction between different workers: on need to use shared 
memory with it's fixed size limitation and
impossibility to use normal pointer for dynamic shared memory. Also no 
need to implement specialized memory allocator for shared memory.
It makes implementation of parallel query execution and built-on 
connection pooling much easier.
2. Optimize virtual-to-physical address translation. There is no need to 
maintain separate address space for each backend, so TLB(translation 
lookaside buffercan) becomes more efficient.


So it is not completely correct to consider session pooling as 
alternative to pthreads.

Ideally this two approaches should be combined.



Is there anything we can do differently about allowing long-idle
connections to reduce their resource usage, e.g. free their caches?
Remove from PGPROC?  Could we do it conditionally, e.g. only sessions
that don't have open transactions or cursors?
I think that the best approach is to switch to global (shared) caches 
for execution plans, catalog,...
Most of the time this metadata caches are used to be identical for all 
clients. So it is just waste of memory and time to maintain them 
separately in each backend.
Certainly shared cached requires some synchronization when can be a 
point of contention and cause significant degrade of performance.
But taking in account that metadata is updated much rarely than data, I 
hope using copy-on-write and atomic operations can help to solve this 
problems.
And in can give a lot of different advantages. For example it will be 
possible to spend more time in optimizer for detecting optimal execution 
plan and store manually plans for

future use.


It feels like user and db mismatches are always going to cause pooling
problems.  Could we actually exit and restart connections that have
default session state?


Well, combining multiuser access and connection pooling is really a 
challenged problem.
I do not know the best solution for it now. It will be much simpler to 
find solution with pthreads model...


Most of enterprise systems are using pgbouncer or similar connection 
pooler. In pgbouncer in statement/transaction pooling mode access to the 
database is performed under the same user. So it means that many existed 
statements are built in the assumption that database is accessed in this 
manner.


Concerning "default session state": one of the main drawbacks of 
pgbouncer and other external poolers is that them do not allow to use 
prepared statements.
And it leads to up to two times performance penalty on typical OLTP 
queries. One of the main ideads of built-on session pooling was to 
eliminate such limitation.




Right now, if you hit max_connections, we start rejecting new
connections.  Would it make sense to allow an option to exit idle
connections when this happens so new users can connect?
It will require changes in client applications, will not it? Them should 
be 

Re: Built-in connection pooling

2018-01-28 Thread Bruce Momjian
On Sun, Jan 28, 2018 at 03:11:25PM -0800, Ivan Novick wrote:
> > The simplest thing sounds like a GUC that will automitcally end a connection
> 
> > idle for X seconds.
> 
> Uh, we already have idle_in_transaction_session_timeout so we would just
> need a simpler version.
> 
> 
> Oh i see its in 9.6, AWESOME! 

In summary, the good news is that adding an idle-session-timeout GUC, a
max_connections limit hit cancels idle connections GUC, and a GUC for
idle connections to reduce their resource usage shouldn't be too hard to
implement and will provide useful benefits.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-01-28 Thread Ivan Novick
> The simplest thing sounds like a GUC that will automitcally end a
connection

> > idle for X seconds.
>
> Uh, we already have idle_in_transaction_session_timeout so we would just
> need a simpler version.
>

Oh i see its in 9.6, AWESOME!

Cheers


Re: Built-in connection pooling

2018-01-28 Thread Ivan Novick
On Sat, Jan 27, 2018 at 4:40 PM, Bruce Momjian  wrote:

> On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:
> Right now, if you hit max_connections, we start rejecting new
> connections.  Would it make sense to allow an option to exit idle
> connections when this happens so new users can connect?
>
>
A lot of users have bash scripts to check the system periodically and canel
idle connections to prevent other users from getting rejected by max
connections.  They do this on a timer, like if the session appears to be
idle more than 10 minutes.


> I know we have relied on external connection poolers to solve all the
> high connection problems but it seems there might be simple things we
> can do to improve matters.  FYI, I did write a blog entry comparing
> external and internal connection poolers:
>

Yes, that would be great.

The simplest thing sounds like a GUC that will automitcally end a
connection idle for X seconds.

Another option could be as you suggested, Bruce, if a user would have
failed because of max connections already reached, then terminate the
connection that has been idle the longest and allow a new connection to
come in.

These would greatly improve user experience as most folks have to automate
this all themselves anyway.

Cheers,
Ivan


Re: Built-in connection pooling

2018-01-27 Thread Bruce Momjian
On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:
> > Yes, external connection pooling is more flexible. It allows to 
> > perform pooling either at client side either at server side (or even 
> > combine two approaches).>
> > Also external connection pooling for PostgreSQL is not limited by 
> > pgbouncer/pgpool.>
> > There are many frameworks maintaining their own connection pool, for 
> > example J2EE, jboss, hibernate,...>
> > I have a filling than about 70% of enterprise systems working with 
> > databases are written in Java and doing connection pooling in their 
> > own way.>
> 
> True, but that does not really mean we don't need "our" connection
> pooling (built-in or not). The connection pools are usually built into
> the application servers, so each application server has their own
> independent pool. With larger deployments (a couple of application
> servers) that quickly causes problems with max_connections.

I found this thread and the pthread thread very interesting.  Konstantin,
thank you for writing prototypes and giving us very useful benchmarks
for ideas I thought I might never see.

As much as I would like to move forward with coding, I would like to
back up and understand where we need to go with these ideas.

First, it looks like pthreads and a builtin pooler help mostly with
1000+ connections.  It seems like you found that pthreads wasn't
sufficient and the builtin pooler was better.  Is that correct?

Is there anything we can do differently about allowing long-idle
connections to reduce their resource usage, e.g. free their caches? 
Remove from PGPROC?  Could we do it conditionally, e.g. only sessions
that don't have open transactions or cursors?

It feels like user and db mismatches are always going to cause pooling
problems.  Could we actually exit and restart connections that have
default session state?

Right now, if you hit max_connections, we start rejecting new
connections.  Would it make sense to allow an option to exit idle
connections when this happens so new users can connect?

I know we have relied on external connection poolers to solve all the
high connection problems but it seems there might be simple things we
can do to improve matters.  FYI, I did write a blog entry comparing
external and internal connection poolers:

https://momjian.us/main/blogs/pgblog/2017.html#April_21_2017

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-01-22 Thread Tomas Vondra


On 01/22/2018 05:05 PM, Konstantin Knizhnik wrote:
> 
> 
> On 19.01.2018 20:28, Tomas Vondra wrote:
>>>
>>> With pgbouncer you will never be able to use prepared statements which
>>> slows down simple queries almost twice (unless my patch with
>>> autoprepared statements is committed).
>>>
>> I don't see why that wouldn't be possible? Perhaps not for prepared
>> statements with simple protocol, but I'm pretty sure it's doable for
>> extended protocol (which seems like a reasonable limitation).
>>
>> That being said, I think it's a mistake to turn this thread into a
>> pgbouncer vs. the world battle. I could name things that are possible
>> only with standalone connection pool - e.g. pausing connections and
>> restarting the database without interrupting the clients.
>>
>> But that does not mean built-in connection pool is not useful.
>>
>>
>> regards
>>
> 
> Sorry, I do not understand how extended protocol can help to handle 
> prepared statements without shared prepared statement cache or
> built-in connection pooling.
>

The extended protocol makes it easy for pgbouncer (or any other proxy)
to identify prepared statements, so that it can track (a) which prepared
statements a client defined, and (b) what prepared statements are
defined on a connection. And then do something when a client gets
assigned a connection missing some of those.

I do not claim doing this would be trivial, but I don't see why would
that be impossible.

Of course, the built-in pool can handle this in different ways, as it
has access to the internal caches.

> The problems is that now in Postgres most of caches including catalog
> cache, relation cache, prepared statements cache are private to a backend.

True. I wouldn't say it's a "problem" but it's certainly a challenge for
certain features.

> There is certainly one big advantage of such approach: no need to
> synchronize access to the cache. But it seems to be the only advantage.
> And there are a lot of drawbacks:
> inefficient use of memory, complex invalidation mechanism, not
> compatible with connection pooling...
> 

Perhaps. I personally see the minimal synchronization as a quite
valuable feature.

> So there are three possible ways (may be more, but I know only three):
> 1. Implement built-in connection pooling which will be aware of proper
> use of local caches. This is what I have implemented with the proposed
> approach.
> 2. Implicit autoprepare. Clients will not be able to use standard
> Postgres prepare mechanism, but executor will try to generate generic
> plan for ordinary queries. My implementation of this approach is at
> commit fest.
> 3. Global caches. It seems to be the best solution but the most
> difficult to implement.
> 

Perhaps.

> Actually I think that the discussion about the value of built-in
> connection pooling is very important.

I agree, and I wasn't speaking against built-in connection pooling.

> Yes, external connection pooling is more flexible. It allows to 
> perform pooling either at client side either at server side (or even 
> combine two approaches).>
> Also external connection pooling for PostgreSQL is not limited by 
> pgbouncer/pgpool.>
> There are many frameworks maintaining their own connection pool, for 
> example J2EE, jboss, hibernate,...>
> I have a filling than about 70% of enterprise systems working with 
> databases are written in Java and doing connection pooling in their 
> own way.>

True, but that does not really mean we don't need "our" connection
pooling (built-in or not). The connection pools are usually built into
the application servers, so each application server has their own
independent pool. With larger deployments (a couple of application
servers) that quickly causes problems with max_connections.

> So may be embedded connection pooling is not needed for such
> applications...
>
> But what I have heard from main people is that Postgres' poor
> connection pooling is one of the main drawbacks of Postgres
> complicating it's usage in enterprise environments.
> 

Maybe. I'm sure there's room for improvement.

That being said, when enterprise developers tell me PostgreSQL is
missing some feature, 99% of the time it turns out they're doing
something quite stupid.

> In any case please find updated patch with some code cleanup and
> more comments added.
> 

OK, will look.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Built-in connection pooling

2018-01-22 Thread Konstantin Knizhnik



On 19.01.2018 20:28, Tomas Vondra wrote:


With pgbouncer you will never be able to use prepared statements which
slows down simple queries almost twice (unless my patch with
autoprepared statements is committed).


I don't see why that wouldn't be possible? Perhaps not for prepared
statements with simple protocol, but I'm pretty sure it's doable for
extended protocol (which seems like a reasonable limitation).

That being said, I think it's a mistake to turn this thread into a
pgbouncer vs. the world battle. I could name things that are possible
only with standalone connection pool - e.g. pausing connections and
restarting the database without interrupting the clients.

But that does not mean built-in connection pool is not useful.


regards



Sorry, I do not understand how extended protocol can help to handle 
prepared statements without shared prepared statement cache or built-in 
connection pooling.
The problems is that now in Postgres most of caches including catalog 
cache, relation cache, prepared statements cache are private to a backend.
There is certainly one big advantage of such approach: no need to 
synchronize access to the cache. But it seems to be the only advantage. 
And there are a lot of drawbacks:
inefficient use of memory, complex invalidation mechanism, not 
compatible with connection pooling...


So there are three possible ways (may be more, but I know only three):
1. Implement built-in connection pooling which will be aware of proper 
use of local caches. This is what I have implemented with the proposed 
approach.
2. Implicit autoprepare. Clients will not be able to use standard 
Postgres prepare mechanism, but executor will try to generate generic 
plan for ordinary queries. My implementation of this approach is at 
commit fest.
3. Global caches. It seems to be the best solution but the most 
difficult to implement.


Actually I think that the discussion about the value of built-in 
connection pooling is very important.
Yes, external connection pooling is more flexible. It allows to perform 
pooling either at client side either at server side (or even combine two 
approaches).
Also external connection pooling for PostgreSQL is not limited by 
pgbouncer/pgpool.
There are many frameworks maintaining their own connection pool, for 
example J2EE, jboss, hibernate,...
I have a filling than about 70% of enterprise systems working with 
databases are written in Java and doing connection pooling in their own way.

So may be embedded connection pooling is not needed for such applications...
But what I have heard from main people is that Postgres' poor connection 
pooling is one of the main drawbacks of Postgres complicating it's usage 
in enterprise environments.


In any case please find updated patch with some code cleanup and more 
comments added.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index b945b15..8e8a737 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -813,3 +813,32 @@ build_regtype_array(Oid *param_types, int num_params)
 	result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');
 	return PointerGetDatum(result);
 }
+
+/*
+ * Drop all statements prepared in the specified session.
+ */
+void
+DropSessionPreparedStatements(char const* sessionId)
+{
+	HASH_SEQ_STATUS seq;
+	PreparedStatement *entry;
+	size_t idLen = strlen(sessionId);
+
+	/* nothing cached */
+	if (!prepared_queries)
+		return;
+
+	/* walk over cache */
+	hash_seq_init(, prepared_queries);
+	while ((entry = hash_seq_search()) != NULL)
+	{
+		if (strncmp(entry->stmt_name, sessionId, idLen) == 0 && entry->stmt_name[idLen] == '.')
+		{
+			/* Release the plancache entry */
+			DropCachedPlan(entry->plansource);
+
+			/* Now we can remove the hash table entry */
+			hash_search(prepared_queries, entry->stmt_name, HASH_REMOVE, NULL);
+		}
+	}
+}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index a4f6d4d..7f40edb 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1029,6 +1029,17 @@ pq_peekbyte(void)
 }
 
 /* 
+ *		pq_available_bytes	- get number of buffered bytes available for reading.
+ *
+ * 
+ */
+int
+pq_available_bytes(void)
+{
+	return PqRecvLength - PqRecvPointer;
+}
+
+/* 
  *		pq_getbyte_if_available - get a single byte from connection,
  *			if available
  *
diff --git a/src/backend/port/Makefile b/src/backend/port/Makefile
index aba1e92..56ec998 100644
--- a/src/backend/port/Makefile
+++ b/src/backend/port/Makefile
@@ -21,7 +21,7 @@ subdir = src/backend/port
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = atomics.o dynloader.o pg_sema.o pg_shmem.o $(TAS)
+OBJS = atomics.o dynloader.o pg_sema.o pg_shmem.o send_sock.o $(TAS)
 
 ifeq 

Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra


On 01/19/2018 07:35 PM, Claudio Freire wrote:
> 
> 
> On Fri, Jan 19, 2018 at 2:22 PM, Tomas Vondra
> > wrote:
> 
> 
> 
> On 01/19/2018 06:13 PM, Claudio Freire wrote:
> >
> >
> > On Fri, Jan 19, 2018 at 2:07 PM, Konstantin Knizhnik
> > 
>  >> wrote:
> >
> >
> >
> >
> >         Well, I haven't said it has to be single-threaded like
> pgbouncer. I
> >         don't see why the bgworker could not use multiple threads
> >         internally (of
> >         course, it'd need to be not to mess the stuff that is not
> >         thread-safe).
> >
> >
> >     Certainly architecture with N multiple scheduling bgworkers and M
> >     executors (backends) may be more flexible
> >     than solution when scheduling is done in executor itself. But we
> >     will have to pay extra cost for redirection.
> >     I am not sure that finally it will allow to reach better
> performance.
> >     More flexible solution in many cases doesn't mean more efficient
> >     solution.
> >
> >
> > I think you can take the best of both worlds.
> >
> > You can take your approach of passing around fds, and build a "load
> > balancing protocol" in a bgworker.
> >
> > The postmaster sends the socket to the bgworker, the bgworker
> waits for
> > a command as pgbouncer does, but instead of proxying everything, when
> > commands arrive, it passes the socket to a backend to handle.
> >
> > That way, the bgworker can do what pgbouncer does, handle different
> > pooling modes, match backends to databases, etc, but it doesn't
> have to
> > proxy all data, it just delegates handling of a command to a backend,
> > and forgets about that socket.
> >
> > Sounds like it could work.
> >
> 
> How could it do all that without actually processing all the data? For
> example, how could it determine the statement/transaction boundaries?
> 
> 
> It only needs to determine statement/transaction start.
> 
> After that, it hands off the connection to a backend, and the
> backend determines when to give it back.
> 
> So instead of processing all the data, it only processes a tiny part of it.
> 

How exactly would the backend "give back" the connection? The only way
for the backend and pgbouncer to communicate is by embedding information
in the data stream. Which means pgbouncer still has to parse it.

Furthermore, those are not the only bits of information pgbouncer may
need. For example, if pgbouncer gets improved to handle prepared
statements (which is likely) it'd need to handle PARSE/BIND/EXECUTE. And
it already needs to handle SET parameters. And so on.

In any case, this discussion is somewhat off topic in this thread, so
let's not hijack it.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Built-in connection pooling

2018-01-19 Thread Claudio Freire
On Fri, Jan 19, 2018 at 2:22 PM, Tomas Vondra 
wrote:

>
>
> On 01/19/2018 06:13 PM, Claudio Freire wrote:
> >
> >
> > On Fri, Jan 19, 2018 at 2:07 PM, Konstantin Knizhnik
> > > wrote:
> >
> >
> >
> >
> > Well, I haven't said it has to be single-threaded like
> pgbouncer. I
> > don't see why the bgworker could not use multiple threads
> > internally (of
> > course, it'd need to be not to mess the stuff that is not
> > thread-safe).
> >
> >
> > Certainly architecture with N multiple scheduling bgworkers and M
> > executors (backends) may be more flexible
> > than solution when scheduling is done in executor itself. But we
> > will have to pay extra cost for redirection.
> > I am not sure that finally it will allow to reach better performance.
> > More flexible solution in many cases doesn't mean more efficient
> > solution.
> >
> >
> > I think you can take the best of both worlds.
> >
> > You can take your approach of passing around fds, and build a "load
> > balancing protocol" in a bgworker.
> >
> > The postmaster sends the socket to the bgworker, the bgworker waits for
> > a command as pgbouncer does, but instead of proxying everything, when
> > commands arrive, it passes the socket to a backend to handle.
> >
> > That way, the bgworker can do what pgbouncer does, handle different
> > pooling modes, match backends to databases, etc, but it doesn't have to
> > proxy all data, it just delegates handling of a command to a backend,
> > and forgets about that socket.
> >
> > Sounds like it could work.
> >
>
> How could it do all that without actually processing all the data? For
> example, how could it determine the statement/transaction boundaries?
>

It only needs to determine statement/transaction start.

After that, it hands off the connection to a backend, and the backend
determines when to give it back.

So instead of processing all the data, it only processes a tiny part of it.


Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra


On 01/19/2018 06:07 PM, Konstantin Knizhnik wrote:
> 
> ...
>
 3) Is there any sort of shrinking the pools? I mean, if the backend is
 idle for certain period of time (or when we need backends for other
 databases), does it get closed automatically?

>>> When client is disconnected, client session is closed. But backen is not
>>> terminated even if there are no more sessions at this backend.
>>> It  was done intentionally, to avoid permanent spawning of new processes
>>> when there is one or few clients which frequently connect/disconnect to
>>> the database.
>>>
>> Sure, but it means a short peak will exhaust the backends indefinitely.
>> That's acceptable for a PoC, but I think needs to be fixed eventually.
>>
> Sorry, I do not understand it.
> You specify size of backends pool which will server client session.
> Size of this pool is chosen to provide the best performance at the
> particular system and workload.
> So number of backends will never exceed this optimal value even in case
> of "short peak".
> From my point of view terminating backends when there are no active
> sessions is wrong idea in any case, it was not temporary decision just
> for PoC.
> 

That is probably true when there is just a single pool (for one
database/user). But when there are multiple such pools, it forces you to
keep the sum(pool_size) below max_connections. Which seems strange.

I do think the ability to evict backends after some timeout, or when
there is pressure in other pools (different user/database) is rather useful.

>>
>> Well, I haven't said it has to be single-threaded like pgbouncer. I
>> don't see why the bgworker could not use multiple threads internally (of
>> course, it'd need to be not to mess the stuff that is not thread-safe).
>>
> 
> Certainly architecture with N multiple scheduling bgworkers and M
> executors (backends) may be more flexible
> than solution when scheduling is done in executor itself. But we will
> have to pay extra cost for redirection.
>
> I am not sure that finally it will allow to reach better performance.
> More flexible solution in many cases doesn't mean more efficient solution.
> 

Sure, I wasn't really suggesting it's a clear win. I was responding to
your argument that pgbouncer in some cases reaches 100% CPU utilization
- that can be mitigated to a large extent by adding threads. Of course,
the cost for extra level of indirection is not zero.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



  1   2   >