Re: libpq async connection and multiple hosts

2023-10-25 Thread Daniele Varrazzo
On Thu, 26 Oct 2023, 00:10 Jelte Fennema,  wrote:

> On Wed, 25 Oct 2023 at 18:54, Daniele Varrazzo
>  wrote:
> > - connect_timeout
> > - multiple host, hostaddr, port
> > - load_balance_hosts=random
> >
> > Does this list sound complete?
>
> I think you'd also want to resolve the hostnames to IPs yourself and
> iterate over those one-by-one. Otherwise if the first IP returned for
> the hostname times out, you will never connect to the others.
>

For async connections we were already unpacking and processing the hosts
list, in order to perform non-blocking resolution and populate the
hostaddr. This already accounted for the possibility of one host resolving
to more than one address. But then we would have packed everything back
into a single conninfo and made a single connection attempt.

https://github.com/psycopg/psycopg/blob/14740add6bb1aebf593a65245df21699daabfad5/psycopg/psycopg/conninfo.py#L278

The goal here was only non-blocking name resolution. Ahaini understand we
should do is to split on the hosts for sync connections too, shuffle if
requested, and make separate  connection attempts.

-- Daniele

>


Re: libpq async connection and multiple hosts

2023-10-25 Thread Daniele Varrazzo
On Wed, 25 Oct 2023 at 17:35, Jelte Fennema  wrote:

> Another approach is to use tcp_user_timeout instead of connect_timeout
> to skip non-responsive hosts. It's not completely equivalent though to
> connection_timeout though, since it also applies when the connection
> is actually being used. Also it only works on Linux afaik. It could be
> nice to add support for BSD its TCP_CONNECTIONTIMEOUT socket option.

This seems brittle and platform-dependent enough that we would surely
receive less grief by hardcoding a default two minutes timeout.

-- Daniele




Re: libpq async connection and multiple hosts

2023-10-25 Thread Daniele Varrazzo
On Wed, 25 Oct 2023 at 17:35, Jelte Fennema  wrote:

> You should implement load_balance_hosts=random though
> by randomizing your hosts list.

Good catch. So it seems that, if someone wants to build an equivalent
an async version of PQconnectdb, they need to handle on their own:

- connect_timeout
- multiple host, hostaddr, port
- load_balance_hosts=random

Does this list sound complete?

-- Daniele




libpq async connection and multiple hosts

2023-10-25 Thread Daniele Varrazzo
Hello,

We are aware that, using async connection functions (`PQconnectStart`,
`PQconnectPoll`), the `connect_timeout` parameter is not supported;
this is documented at
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PQCONNECTSTARTPARAMS

"""
The connect_timeout connection parameter is ignored when using
PQconnectPoll; it is the application's responsibility to decide
whether an excessive amount of time has elapsed. Otherwise,
PQconnectStart followed by a PQconnectPoll loop is equivalent to
PQconnectdb.
"""

However, ISTM that connecting to multiple hosts is not supported
either. I have a couple of issues I am looking into in psycopg 3:

- https://github.com/psycopg/psycopg/issues/602
- https://github.com/psycopg/psycopg/issues/674

Do we have to reimplement the connection attempts loop too?

Are there other policies that we would need to reimplement? Is
`target_session_attrs` taken care of by PQconnectPoll?

On my box (testing with psql and libpq itself),
PQconnect("host=8.8.8.8") fails after 2m10s. Is this the result of
some unspecified socket connection timeout on my Ubuntu machine?.

If we need to reimplement async connection to "host=X,Y", we will need
to use a timeout even if the user didn't specify one, otherwise we
will never stop the connection attempt to X and move to Y. What
timeout can we specify that will not upset anyone?

Thank you very much

-- Daniele




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-05-23 Thread Daniele Varrazzo
On Sat, 20 May 2023 at 00:01, Jacob Champion  wrote:

> - Some clients in the wild (psycopg2/psycopg) suppress all notifications
> during PQconnectPoll().

If there is anything we can improve in psycopg please reach out.

-- Daniele




$1 IS NULL with unknown type

2022-11-21 Thread Daniele Varrazzo
Hello,

The operator `IS NULL` doesn't work if the argument has unknown type.
In psycopg 3:

>>> conn.execute("select %s is null", ['foo']).fetchone()
IndeterminateDatatype: could not determine data type of parameter $1

This can get in the way of using the unknown type for strings (but
specifying the text oid for strings is worse, because there is no
implicit cast from string to most types).

It doesn't seem necessary to specify a type for an argument if it only
has to be compared with null: nullness is independent from the type
and is even specified, in the query parameters, in a separate array
from the parameter values.

Maybe this behaviour can be relaxed?

Cheers

-- Daniele




Re: Libpq single-row mode slowness

2022-05-01 Thread Daniele Varrazzo
On Sun, 1 May 2022 at 23:12, Tom Lane  wrote:

> The usual expectation is that you call PQconsumeInput to get rid of
> a read-ready condition on the socket.  If you don't have a poll() or
> select() or the like in the loop, you might be wasting a lot of
> pointless recvfrom calls.  You definitely don't need to call
> PQconsumeInput if PQisBusy is already saying that a result is available,
> and in single-row mode it's likely that several results can be consumed
> per recvfrom call.

This makes sense and, with some refactoring of our fetch loop, the
overhead of using single-row mode is now down to about 3x, likely
caused by the greater overhead in Python calls.

Please note that the insight you gave in your answer seems to
contradict the documentation. Some excerpts of
https://www.postgresql.org/docs/current/libpq-async.html:

"""
PQconsumeInput:  "After calling PQconsumeInput , the application can
check PQisBusy and/or PQnotifies to see if their state has changed"

PQisBusy: "will not itself attempt to read data from the server;
therefore PQconsumeInput must be invoked first, or the busy state will
never end."

...
A typical application [will use select()]. When the main loop detects
input ready, it should call PQconsumeInput to read the input. It can
then call PQisBusy, followed by PQgetResult if PQisBusy returns false
(0).
"""

All these indications give the impression that there is a sort of
mandatory order, requiring to call first PQconsumeInput, then
PQisBusy. As a consequence, the core of our function to fetch a single
result was implemented as:

```
def fetch(pgconn):
while True:
pgconn.consume_input()
if not pgconn.is_busy():
break
yield Wait.R

return pgconn.get_result()
```

(Where the `yield Wait.R` suspends this execution to call into
select() or whatever waiting policy the program is using.)

Your remarks suggest that PQisBusy() can be called before
PQconsumeInput(), and that the latter doesn't need to be called if not
busy. As such I have modified the loop to be:

```
def fetch(pgconn):
if pgconn.is_busy():
yield Wait.R
while True:
pgconn.consume_input()
if not pgconn.is_busy():
break
yield Wait.R

return pgconn.get_result()
```

which seems to work well: tests don't show regressions and single-row
mode doesn't waste recvfrom() anymore.

Is this new fetching pattern the expected way to interact with the libpq?

If so, should we improve the documentation to suggest that there are
reasons to call PQisBusy before PQconsumeInput? Especially in the
single-row mode docs page, which doesn't make relevant mentions to the
use of these functions.

Thank you very much for your help, really appreciated.

-- Daniele




Libpq single-row mode slowness

2022-05-01 Thread Daniele Varrazzo
Hello,

a Psycopg 3 user has tested what boils down pretty much to a
"generate_series(100K)" and has reported a 100x difference between
fetching it normally and in single-row mode. I have repeated the test
myself and I have found a 50x difference (the OP doesn't specify which
platform is using, mine is Linux).

Of course calling PQconsumeInput 100K times has some overhead compared
to calling it only once. However, I would like to know if this level
of overhead is expected, or if instead anyone smells some wasted
cycles.

According to some profiling, the largest part of the time is spent
inside a libc function I don't know the symbol of, called by
pqReadData(). Details and pretty graphs are available at
https://github.com/psycopg/psycopg/issues/286

The operations we perform, for every row, are PQconsumeInput,
PQisBusy, PQgetResult. Every PQconsumeInput results in a recvfrom()
syscall, of which the first one returns the whole recordset, the
following ones return EAGAIN. There are two extra cycles: one to get
the TUPLES_OK result, one to get the end-of-stream NULL. It seems the
documented usage pattern, but I'm not sure if I'm not misreading it,
especially in the light of this libpq grumble [1].

[1] 
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L681

Our connection is in non-blocking mode and we see the need for waiting
(using epoll here) only on the first call. The resulting strace of the
entire query process (of two rows) is:

21:36:53.659529 sendto(3, "P\0\0\0>\0select 'test' a, t b from "...,
108, MSG_NOSIGNAL, NULL, 0) = 108
21:36:53.660236 recvfrom(3, 0x1f6a870, 16384, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.660589 epoll_create1(EPOLL_CLOEXEC) = 4
21:36:53.660848 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT,
{u32=3, u64=3}}) = 0
21:36:53.661099 epoll_wait(4, [{EPOLLIN, {u32=3, u64=3}}], 1023, 10) = 1
21:36:53.661941 recvfrom(3,
"1\0\0\0\0042\0\0\0\4T\0\0\0.\0\2a\0\0\0\0\0\0\0\0\0\0\31\377\377\377"...,
16384, 0, NULL, NULL) = 117
21:36:53.662506 close(4)= 0
21:36:53.662830 recvfrom(3, 0x1f6a898, 16344, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663162 recvfrom(3, 0x1f6a884, 16364, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663359 recvfrom(3, 0x1f6a876, 16378, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)

The test is on a localhost connection with sslmode disabled using libpq 14.2.

Is this the correct usage? Any insight is welcome.

Thank you very much!

-- Daniele




Is it safe to use the extended protocol with COPY?

2021-09-01 Thread Daniele Varrazzo
Hello,

in psycopg 3 we are currently using PQexecParams - although with no
params - to send COPY commands. The reason is mostly to avoid people
to send COPY together with other statements. Especially if other
operations are chained after COPY: we would only notice them after
copy is finished. Data changes might have been applied by then, so
throwing an exception seems impolite (the result might have been
applied already) but managing the result is awkward too.

Someone [1] has pointed out this conversation [2] which suggests that
COPY with extended protocol might break in the future.

[1] https://github.com/psycopg/psycopg/issues/78
[2] 
https://www.postgresql.org/message-id/flat/CAMsr%2BYGvp2wRx9pPSxaKFdaObxX8DzWse%2BOkWk2xpXSvT0rq-g%40mail.gmail.com#camsr+ygvp2wrx9ppsxakfdaobxx8dzwse+okwk2xpxsvt0r...@mail.gmail.com

As far as PostgreSQL is concerned, would it be better to stick to
PQexec with COPY, and if people append statements afterwards they
would be the ones to deal with the consequences? (being the server
applying the changes, the client throwing an exception)

Thank you very much

-- Daniele




Re: Is a connection max lifetime useful in a connection pool?

2021-02-21 Thread Daniele Varrazzo
On Sun, 21 Feb 2021 at 19:12, Pavel Stehule  wrote:

> I have very strong experience - it is very useful.

On Sun, 21 Feb 2021 at 19:26, Stephen Frost  wrote:

> Short answer- yes.

Sounds good. Thank you very much for your insight!

-- Daniele




Is a connection max lifetime useful in a connection pool?

2021-02-21 Thread Daniele Varrazzo
Hello,

I am designing and implementing a connection pool for psycopg3 [1][2].
Some of the inspiration is coming from HikariCP [3], a Java connection
pool.

One of the HikariCP configuration parameters is "maxLifetime", whose
description is: "This property controls the maximum lifetime of a
connection in the pool. [...] **We strongly recommend setting this
value, and it should be several seconds shorter than any database or
infrastructure imposed connection time limit.**" (bold is theirs,
default value is 30 mins).

When discussing the pool features in the psycopg mailing list someone
pointed out "what is the utility of this parameter? connections don't
rot, do they?"

Hikari is a generic connection pool, not one specific for Postgres. So
I'm wondering: is there any value in periodically deleting and
recreating connections for a Postgres-specific connection pool? Is a
MaxLifetime parameter useful?

Thank you very much,

-- Daniele

[1]: https://www.psycopg.org/articles/2021/01/17/pool-design/
[2]: 
https://github.com/psycopg/psycopg3/blob/connection-pool/psycopg3/psycopg3/pool.py
[3]: https://github.com/brettwooldridge/HikariCP




Types info on binary copy

2021-01-05 Thread Daniele Varrazzo
Hello,

The PGresult structure returned on COPY ... FROM provides a handful of
information (https://www.postgresql.org/docs/13/libpq-copy.html):
number of fields expected and whether they are binary or not.

However it doesn't provide the types expected. For binary copy the
exact type is important: playing with it, it seems that no cast
whatsoever is applied and the errors reported are relatively low
level. For instance, passing an int4 value to an int8 field causes a
"protocol_violation: insufficient data left in message"; the other way
around is greeted with "invalid_binary_representation: incorrect
binary data format".

Naively, it would seem that once a "COPY ... FROM STDIN" is executed
successfully, the server has a pretty good idea of what data types it
is expecting. I'm wondering: is it absurd to ask for this info to be
returned as RowDescription and to be exposed by the libpq PQftype()?

Cheers

-- Daniele




Use of pager, help, localization:

2018-10-15 Thread Daniele Varrazzo
>From psql/help.c

/*
* Keep this line count in sync with the number of lines printed below!
* Use "psql --help=variables | wc" to count correctly; but notice that
* Windows builds currently print one more line than non-Windows builds.
* Using the larger number is fine.
*/
output = PageOutput(156, pager ? &(pset.popt.topt) : NULL);

uhm... this doesn't work at all with i10n, above all if respecting an
80 cols width.

Wouldn't be better to generate the output in a temp buffer after l10n,
count the line, then choose whether to use a pager or not?

Can provide a patch if you deem so.

-- Daniele