Re: Psycopg2 pool clarification

2017-06-10 Thread israel

On 2017-06-08 19:55, Ian Kelly wrote:
On Thu, Jun 8, 2017 at 10:47 AM, Israel Brewster 
 wrote:

On Jun 7, 2017, at 10:31 PM, dieter  wrote:

israel  writes:

On 2017-06-06 22:53, dieter wrote:
...
As such, using psycopg2's pool is essentially
worthless for me (plenty of use for it, i'm sure, just not for me/my
use case).


Could you not simply adjust the value for the "min" parameter?
If you want at least "n" open connections, then set "min" to "n".


Well, sure, if I didn't care about wasting resources (which, I guess 
many people don't). I could set "n" to some magic number that would 
always give "enough" connections, such that my application never has 
to open additional connections, then adjust that number every few 
months as usage changes. In fact, now that I know how the logic of the 
pool works, that's exactly what I'm doing until I am confident that my 
caching replacement is solid.


Of course, in order to avoid having to open/close a bunch of 
connections during the times when it is most critical - that is, when 
the server is under heavy load - I have to set that number arbitrarily 
high. Furthermore, that means that much of the time many, if not most, 
of those connections would be idle. Each connection uses a certain 
amount of RAM on the server, not to mention using up limited 
connection slots, so now I've got to think about if my server is sized 
properly to be able to handle that load not just occasionally, but 
constantly - when reducing server load by reducing the frequency of 
connections being opened/closed was the goal in the first place. So 
all I've done is trade dynamic load for static load - increasing 
performance at the cost of resources, rather than more intelligently 
using the available resources. All-in-all, not the best solution, 
though it does work. Maybe if load was fairly constant it would make 
more sense though. So like I said *my* use c

 ase, whi
 ch
  is a number of web apps with varying loads, loads that also vary 
from day-to-day and hour-to-hour.


On the other hand, a pool that caches connections using the logic I 
laid out in my original post would avoid the issue. Under heavy load, 
it could open additional connections as needed - a performance penalty 
for the first few users over the min threshold, but only the first 
few, rather than all the users over a certain threshold ("n"). Those 
connections would then remain available for the duration of the load, 
so it doesn't need to open/close numerous connections. Then, during 
periods of lighter load, the unused connections can drop off, freeing 
up server resources for other uses. A well-written pool could even do 
something like see that the available connection pool is running low, 
and open a few more connections in the background, thus completely 
avoiding the connection overhead on requests while never having more 
than a few "extra" connections at any given time. Even if you left of 
the expiration logic, it would still be an improvement, because while 
unused connections

 wouldn't
  d
 rop, the "n" open connections could scale up dynamically until you 
have "enough" connections, without having to figure out and hard-code 
that "magic number" of open connections.


Why wouldn't I want something like that? It's not like its hard to 
code - took me about an hour and a half to get to a working prototype 
yesterday. Still need to write tests and add some polish, but it 
works. Perhaps, though, the common thought is just "throw more 
hardware at it and keep a lot of connections open at all time?" Maybe 
I was raised to conservatively, or the company I work for is too 
poor :-D


Psycopg is first and foremost a database adapter. To quote from the
psycopg2.pool module documentation, "This module offers a few pure
Python classes implementing *simple* connection pooling directly in
the client application" (emphasis added). The advertised list of
features at http://initd.org/psycopg/features/ doesn't even mention
connection pooling. In short, you're getting what you paid for.

It sounds like your needs are beyond what the psycopg2.pool module
provides.


Quite possible. Thus the reason I was looking for clarification on how 
the module was intended to work - if it doesn't work in the way that I 
want it to, I need to look elsewhere for a solution. My main reason for 
posting this thread was that I was expecting it to work one way, but 
testing showed it working another way, so I was trying to find out if 
that was intentional or user error. Apparently it's intentional, so 
there we go - in it's current form at least, my needs are beyond what 
the psycopg2 pool provides. Fair enough.



I suggest looking into a dedicated connection pooler like
PgBouncer. You'll find that it's much more feature-rich and
configurable than psycopg2.pool. It's production-ready, unlike your
prototype. And since it's a proxy, it can take connections from
multiple client apps and tune the pool to your overall load rather
than on

Re: Psycopg2 pool clarification

2017-06-08 Thread Ian Kelly
On Thu, Jun 8, 2017 at 10:47 AM, Israel Brewster  wrote:
>> On Jun 7, 2017, at 10:31 PM, dieter  wrote:
>>
>> israel  writes:
>>> On 2017-06-06 22:53, dieter wrote:
>>> ...
>>> As such, using psycopg2's pool is essentially
>>> worthless for me (plenty of use for it, i'm sure, just not for me/my
>>> use case).
>>
>> Could you not simply adjust the value for the "min" parameter?
>> If you want at least "n" open connections, then set "min" to "n".
>
> Well, sure, if I didn't care about wasting resources (which, I guess many 
> people don't). I could set "n" to some magic number that would always give 
> "enough" connections, such that my application never has to open additional 
> connections, then adjust that number every few months as usage changes. In 
> fact, now that I know how the logic of the pool works, that's exactly what 
> I'm doing until I am confident that my caching replacement is solid.
>
> Of course, in order to avoid having to open/close a bunch of connections 
> during the times when it is most critical - that is, when the server is under 
> heavy load - I have to set that number arbitrarily high. Furthermore, that 
> means that much of the time many, if not most, of those connections would be 
> idle. Each connection uses a certain amount of RAM on the server, not to 
> mention using up limited connection slots, so now I've got to think about if 
> my server is sized properly to be able to handle that load not just 
> occasionally, but constantly - when reducing server load by reducing the 
> frequency of connections being opened/closed was the goal in the first place. 
> So all I've done is trade dynamic load for static load - increasing 
> performance at the cost of resources, rather than more intelligently using 
> the available resources. All-in-all, not the best solution, though it does 
> work. Maybe if load was fairly constant it would make more sense though. So 
> like I said *my* use case, whi
 ch
>   is a number of web apps with varying loads, loads that also vary from 
> day-to-day and hour-to-hour.
>
> On the other hand, a pool that caches connections using the logic I laid out 
> in my original post would avoid the issue. Under heavy load, it could open 
> additional connections as needed - a performance penalty for the first few 
> users over the min threshold, but only the first few, rather than all the 
> users over a certain threshold ("n"). Those connections would then remain 
> available for the duration of the load, so it doesn't need to open/close 
> numerous connections. Then, during periods of lighter load, the unused 
> connections can drop off, freeing up server resources for other uses. A 
> well-written pool could even do something like see that the available 
> connection pool is running low, and open a few more connections in the 
> background, thus completely avoiding the connection overhead on requests 
> while never having more than a few "extra" connections at any given time. 
> Even if you left of the expiration logic, it would still be an improvement, 
> because while unused connections wouldn't
  d
>  rop, the "n" open connections could scale up dynamically until you have 
> "enough" connections, without having to figure out and hard-code that "magic 
> number" of open connections.
>
> Why wouldn't I want something like that? It's not like its hard to code - 
> took me about an hour and a half to get to a working prototype yesterday. 
> Still need to write tests and add some polish, but it works. Perhaps, though, 
> the common thought is just "throw more hardware at it and keep a lot of 
> connections open at all time?" Maybe I was raised to conservatively, or the 
> company I work for is too poor :-D

Psycopg is first and foremost a database adapter. To quote from the
psycopg2.pool module documentation, "This module offers a few pure
Python classes implementing *simple* connection pooling directly in
the client application" (emphasis added). The advertised list of
features at http://initd.org/psycopg/features/ doesn't even mention
connection pooling. In short, you're getting what you paid for.

It sounds like your needs are beyond what the psycopg2.pool module
provides. I suggest looking into a dedicated connection pooler like
PgBouncer. You'll find that it's much more feature-rich and
configurable than psycopg2.pool. It's production-ready, unlike your
prototype. And since it's a proxy, it can take connections from
multiple client apps and tune the pool to your overall load rather
than on an app-by-app basis (and thus risk overloading the backend if
multiple apps unexpectedly peak together).

As for why psycopg2.pool is the way it is, maybe most users don't have
your situation of serving multiple apps with loads varying on
different cycles. Most are probably only serving a single app, or if
serving multiple apps then they likely have common user bases with
similar peak times. You can't dynamically adjust the amount of RAM in
your server, so saving re

Re: Psycopg2 pool clarification

2017-06-08 Thread Israel Brewster

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




> On Jun 7, 2017, at 10:31 PM, dieter  wrote:
> 
> israel  writes:
>> On 2017-06-06 22:53, dieter wrote:
>> ...
>> As such, using psycopg2's pool is essentially
>> worthless for me (plenty of use for it, i'm sure, just not for me/my
>> use case).
> 
> Could you not simply adjust the value for the "min" parameter?
> If you want at least "n" open connections, then set "min" to "n".

Well, sure, if I didn't care about wasting resources (which, I guess many 
people don't). I could set "n" to some magic number that would always give 
"enough" connections, such that my application never has to open additional 
connections, then adjust that number every few months as usage changes. In 
fact, now that I know how the logic of the pool works, that's exactly what I'm 
doing until I am confident that my caching replacement is solid. 

Of course, in order to avoid having to open/close a bunch of connections during 
the times when it is most critical - that is, when the server is under heavy 
load - I have to set that number arbitrarily high. Furthermore, that means that 
much of the time many, if not most, of those connections would be idle. Each 
connection uses a certain amount of RAM on the server, not to mention using up 
limited connection slots, so now I've got to think about if my server is sized 
properly to be able to handle that load not just occasionally, but constantly - 
when reducing server load by reducing the frequency of connections being 
opened/closed was the goal in the first place. So all I've done is trade 
dynamic load for static load - increasing performance at the cost of resources, 
rather than more intelligently using the available resources. All-in-all, not 
the best solution, though it does work. Maybe if load was fairly constant it 
would make more sense though. So like I said *my* use case, which
  is a number of web apps with varying loads, loads that also vary from 
day-to-day and hour-to-hour.

On the other hand, a pool that caches connections using the logic I laid out in 
my original post would avoid the issue. Under heavy load, it could open 
additional connections as needed - a performance penalty for the first few 
users over the min threshold, but only the first few, rather than all the users 
over a certain threshold ("n"). Those connections would then remain available 
for the duration of the load, so it doesn't need to open/close numerous 
connections. Then, during periods of lighter load, the unused connections can 
drop off, freeing up server resources for other uses. A well-written pool could 
even do something like see that the available connection pool is running low, 
and open a few more connections in the background, thus completely avoiding the 
connection overhead on requests while never having more than a few "extra" 
connections at any given time. Even if you left of the expiration logic, it 
would still be an improvement, because while unused connections wouldn't d
 rop, the "n" open connections could scale up dynamically until you have 
"enough" connections, without having to figure out and hard-code that "magic 
number" of open connections.

Why wouldn't I want something like that? It's not like its hard to code - took 
me about an hour and a half to get to a working prototype yesterday. Still need 
to write tests and add some polish, but it works. Perhaps, though, the common 
thought is just "throw more hardware at it and keep a lot of connections open 
at all time?" Maybe I was raised to conservatively, or the company I work for 
is too poor :-D

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Psycopg2 pool clarification

2017-06-07 Thread dieter
israel  writes:
> On 2017-06-06 22:53, dieter wrote:
> ...
> As such, using psycopg2's pool is essentially
> worthless for me (plenty of use for it, i'm sure, just not for me/my
> use case).

Could you not simply adjust the value for the "min" parameter?
If you want at least "n" open connections, then set "min" to "n".

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Psycopg2 pool clarification

2017-06-07 Thread israel

On 2017-06-06 22:53, dieter wrote:

israel  writes:

Since I've gotten no replies to this, I was wondering if someone could
at least confirm which behavior (my expected or my observed) is
*supposed* to be the correct? Should a psycopg2 pool keep connections
open when returned to the pool (if closed is False), or should it
close them as long as there is more than minconn open? i.e is my
observed behavior a bug or a feature?


You should ask the author[s] of "psycopg2" about the supposed behavior.


From my point of view, everything depends on the meaning of the "min"
and "max" parameters for the pool.

You seem to interprete "max" as "keep as many connections as this 
open".

But it can also be a hard limit in the form "never open more than this
number of connections". In the latter case, "min" may mean "keep this
many connections open at all time".


You are right about my interpretation of "max", and also about the 
actual meaning. Thus the reason I was asking :-). I did post on the bug 
report forum, and was informed that the observed behavior was the 
correct behavior. As such, using psycopg2's pool is essentially 
worthless for me (plenty of use for it, i'm sure, just not for me/my use 
case).


So let me ask a different, but related, question: Is there a Python 
library available that gives me the behavior I described in my first 
post, where connections are "cached" for future use for a time? Or 
should I just write my own? I didn't find anything with some quick 
googling, other than middleware servers like pgpool which, while they 
have the behavior I want (at least from my reading), will still require 
the overhead of making a connection (perhaps less than direct to 
postgres? Any performance comparisons out there?), not to mention 
keeping yet another service configured/running. I would prefer to keep 
the pool internal to my application, if possible, and simply reuse 
existing connections rather than making new ones. Thanks!

--
https://mail.python.org/mailman/listinfo/python-list


Re: Psycopg2 pool clarification

2017-06-06 Thread dieter
israel  writes:
> Since I've gotten no replies to this, I was wondering if someone could
> at least confirm which behavior (my expected or my observed) is
> *supposed* to be the correct? Should a psycopg2 pool keep connections
> open when returned to the pool (if closed is False), or should it
> close them as long as there is more than minconn open? i.e is my
> observed behavior a bug or a feature?

You should ask the author[s] of "psycopg2" about the supposed behavior.


>From my point of view, everything depends on the meaning of the "min"
and "max" parameters for the pool.

You seem to interprete "max" as "keep as many connections as this open".
But it can also be a hard limit in the form "never open more than this
number of connections". In the latter case, "min" may mean "keep this
many connections open at all time".

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Psycopg2 pool clarification

2017-06-06 Thread israel
Since I've gotten no replies to this, I was wondering if someone could 
at least confirm which behavior (my expected or my observed) is 
*supposed* to be the correct? Should a psycopg2 pool keep connections 
open when returned to the pool (if closed is False), or should it close 
them as long as there is more than minconn open? i.e is my observed 
behavior a bug or a feature?


On 2017-06-02 15:06, Israel Brewster wrote:

I've been using the psycopg2 pool class for a while now, using code
similar to the following:


pool=ThreadedConnectionPool(0,5,)
conn1=pool.getconn()

pool.putconn(conn1)

 repeat later, or perhaps "simultaneously" in a different thread.

and my understanding was that the pool logic was something like the 
following:


- create a "pool" of connections, with an initial number of
connections equal to the "minconn" argument
- When getconn is called, see if there is an available connection. If
so, return it. If not, open a new connection and return that (up to
"maxconn" total connections)
- When putconn is called, return the connection to the pool for
re-use, but do *not* close it (unless the close argument is specified
as True, documentation says default is False)
- On the next request to getconn, this connection is now available and
so no new connection will be made
- perhaps (or perhaps not), after some time, unused connections would
be closed and purged from the pool to prevent large numbers of only
used once connections from laying around.

However, in some testing I just did, this doesn't appear to be the
case, at least based on the postgresql logs. Running the following
code:


pool=ThreadedConnectionPool(0,5,)
conn1=pool.getconn()
conn2=pool.getconn()
pool.putconn(conn1)
pool.putconn(conn2)
conn3=pool.getconn()
pool.putconn(conn3)


produced the following output in the postgresql log:

2017-06-02 14:30:26 AKDT LOG:  connection received: host=::1 port=64786
2017-06-02 14:30:26 AKDT LOG:  connection authorized: user=logger
database=flightlogs
2017-06-02 14:30:35 AKDT LOG:  connection received: host=::1 port=64788
2017-06-02 14:30:35 AKDT LOG:  connection authorized: user=logger
database=flightlogs
2017-06-02 14:30:46 AKDT LOG:  disconnection: session time:
0:00:19.293 user=logger database=flightlogs host=::1 port=64786
2017-06-02 14:30:53 AKDT LOG:  disconnection: session time:
0:00:17.822 user=logger database=flightlogs host=::1 port=64788
2017-06-02 14:31:15 AKDT LOG:  connection received: host=::1 port=64790
2017-06-02 14:31:15 AKDT LOG:  connection authorized: user=logger
database=flightlogs
2017-06-02 14:31:20 AKDT LOG:  disconnection: session time:
0:00:05.078 user=logger database=flightlogs host=::1 port=64790

Since I set the maxconn parameter to 5, and only used 3 connections, I
wasn't expecting to see any disconnects - and yet as soon as I do
putconn, I *do* see a disconnection. Additionally, I would have
thought that when I pulled connection 3, there would have been two
connections available, and so it wouldn't have needed to connect
again, yet it did. Even if I explicitly say close=False in the putconn
call, it still closes the connection and has to open

What am I missing? From this testing, it looks like I get no benefit
at all from having the connection pool, unless you consider an upper
limit to the number of simultaneous connections a benefit? :-) Maybe a
little code savings from not having to manually call connect and close
after each connection, but that's easily gained by simply writing a
context manager. I could get *some* limited benefit by raising the
minconn value, but then I risk having connections that are *never*
used, yet still taking resources on the DB server.

Ideally, it would open as many connections as are needed, and then
leave them open for future requests, perhaps with an "idle" timeout.
Is there any way to achieve this behavior?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---


--
https://mail.python.org/mailman/listinfo/python-list


Psycopg2 pool clarification

2017-06-02 Thread Israel Brewster
I've been using the psycopg2 pool class for a while now, using code similar to 
the following:

>>> pool=ThreadedConnectionPool(0,5,)
>>> conn1=pool.getconn()
>>> 
>>> pool.putconn(conn1)
 repeat later, or perhaps "simultaneously" in a different thread.

and my understanding was that the pool logic was something like the following:

- create a "pool" of connections, with an initial number of connections equal 
to the "minconn" argument
- When getconn is called, see if there is an available connection. If so, 
return it. If not, open a new connection and return that (up to "maxconn" total 
connections)
- When putconn is called, return the connection to the pool for re-use, but do 
*not* close it (unless the close argument is specified as True, documentation 
says default is False)
- On the next request to getconn, this connection is now available and so no 
new connection will be made
- perhaps (or perhaps not), after some time, unused connections would be closed 
and purged from the pool to prevent large numbers of only used once connections 
from laying around.

However, in some testing I just did, this doesn't appear to be the case, at 
least based on the postgresql logs. Running the following code:

>>> pool=ThreadedConnectionPool(0,5,)
>>> conn1=pool.getconn()
>>> conn2=pool.getconn()
>>> pool.putconn(conn1)
>>> pool.putconn(conn2)
>>> conn3=pool.getconn()
>>> pool.putconn(conn3)

produced the following output in the postgresql log:

2017-06-02 14:30:26 AKDT LOG:  connection received: host=::1 port=64786
2017-06-02 14:30:26 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:30:35 AKDT LOG:  connection received: host=::1 port=64788
2017-06-02 14:30:35 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:30:46 AKDT LOG:  disconnection: session time: 0:00:19.293 
user=logger database=flightlogs host=::1 port=64786
2017-06-02 14:30:53 AKDT LOG:  disconnection: session time: 0:00:17.822 
user=logger database=flightlogs host=::1 port=64788
2017-06-02 14:31:15 AKDT LOG:  connection received: host=::1 port=64790
2017-06-02 14:31:15 AKDT LOG:  connection authorized: user=logger 
database=flightlogs
2017-06-02 14:31:20 AKDT LOG:  disconnection: session time: 0:00:05.078 
user=logger database=flightlogs host=::1 port=64790

Since I set the maxconn parameter to 5, and only used 3 connections, I wasn't 
expecting to see any disconnects - and yet as soon as I do putconn, I *do* see 
a disconnection. Additionally, I would have thought that when I pulled 
connection 3, there would have been two connections available, and so it 
wouldn't have needed to connect again, yet it did. Even if I explicitly say 
close=False in the putconn call, it still closes the connection and has to open

What am I missing? From this testing, it looks like I get no benefit at all 
from having the connection pool, unless you consider an upper limit to the 
number of simultaneous connections a benefit? :-) Maybe a little code savings 
from not having to manually call connect and close after each connection, but 
that's easily gained by simply writing a context manager. I could get *some* 
limited benefit by raising the minconn value, but then I risk having 
connections that are *never* used, yet still taking resources on the DB server.

Ideally, it would open as many connections as are needed, and then leave them 
open for future requests, perhaps with an "idle" timeout. Is there any way to 
achieve this behavior?

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---




-- 
https://mail.python.org/mailman/listinfo/python-list