Re: libpq async connection and multiple hosts
On Thu, 26 Oct 2023, 00:10 Jelte Fennema, wrote: > On Wed, 25 Oct 2023 at 18:54, Daniele Varrazzo > wrote: > > - connect_timeout > > - multiple host, hostaddr, port > > - load_balance_hosts=random > > > > Does this list sound complete? > > I think you'd also want to resolve the hostnames to IPs yourself and > iterate over those one-by-one. Otherwise if the first IP returned for > the hostname times out, you will never connect to the others. > For async connections we were already unpacking and processing the hosts list, in order to perform non-blocking resolution and populate the hostaddr. This already accounted for the possibility of one host resolving to more than one address. But then we would have packed everything back into a single conninfo and made a single connection attempt. https://github.com/psycopg/psycopg/blob/14740add6bb1aebf593a65245df21699daabfad5/psycopg/psycopg/conninfo.py#L278 The goal here was only non-blocking name resolution. Ahaini understand we should do is to split on the hosts for sync connections too, shuffle if requested, and make separate connection attempts. -- Daniele >
Re: libpq async connection and multiple hosts
On Wed, 25 Oct 2023 at 17:35, Jelte Fennema wrote: > Another approach is to use tcp_user_timeout instead of connect_timeout > to skip non-responsive hosts. It's not completely equivalent though to > connection_timeout though, since it also applies when the connection > is actually being used. Also it only works on Linux afaik. It could be > nice to add support for BSD its TCP_CONNECTIONTIMEOUT socket option. This seems brittle and platform-dependent enough that we would surely receive less grief by hardcoding a default two minutes timeout. -- Daniele
Re: libpq async connection and multiple hosts
On Wed, 25 Oct 2023 at 17:35, Jelte Fennema wrote: > You should implement load_balance_hosts=random though > by randomizing your hosts list. Good catch. So it seems that, if someone wants to build an equivalent an async version of PQconnectdb, they need to handle on their own: - connect_timeout - multiple host, hostaddr, port - load_balance_hosts=random Does this list sound complete? -- Daniele
libpq async connection and multiple hosts
Hello, We are aware that, using async connection functions (`PQconnectStart`, `PQconnectPoll`), the `connect_timeout` parameter is not supported; this is documented at https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PQCONNECTSTARTPARAMS """ The connect_timeout connection parameter is ignored when using PQconnectPoll; it is the application's responsibility to decide whether an excessive amount of time has elapsed. Otherwise, PQconnectStart followed by a PQconnectPoll loop is equivalent to PQconnectdb. """ However, ISTM that connecting to multiple hosts is not supported either. I have a couple of issues I am looking into in psycopg 3: - https://github.com/psycopg/psycopg/issues/602 - https://github.com/psycopg/psycopg/issues/674 Do we have to reimplement the connection attempts loop too? Are there other policies that we would need to reimplement? Is `target_session_attrs` taken care of by PQconnectPoll? On my box (testing with psql and libpq itself), PQconnect("host=8.8.8.8") fails after 2m10s. Is this the result of some unspecified socket connection timeout on my Ubuntu machine?. If we need to reimplement async connection to "host=X,Y", we will need to use a timeout even if the user didn't specify one, otherwise we will never stop the connection attempt to X and move to Y. What timeout can we specify that will not upset anyone? Thank you very much -- Daniele
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Sat, 20 May 2023 at 00:01, Jacob Champion wrote: > - Some clients in the wild (psycopg2/psycopg) suppress all notifications > during PQconnectPoll(). If there is anything we can improve in psycopg please reach out. -- Daniele
$1 IS NULL with unknown type
Hello, The operator `IS NULL` doesn't work if the argument has unknown type. In psycopg 3: >>> conn.execute("select %s is null", ['foo']).fetchone() IndeterminateDatatype: could not determine data type of parameter $1 This can get in the way of using the unknown type for strings (but specifying the text oid for strings is worse, because there is no implicit cast from string to most types). It doesn't seem necessary to specify a type for an argument if it only has to be compared with null: nullness is independent from the type and is even specified, in the query parameters, in a separate array from the parameter values. Maybe this behaviour can be relaxed? Cheers -- Daniele
Re: Libpq single-row mode slowness
On Sun, 1 May 2022 at 23:12, Tom Lane wrote: > The usual expectation is that you call PQconsumeInput to get rid of > a read-ready condition on the socket. If you don't have a poll() or > select() or the like in the loop, you might be wasting a lot of > pointless recvfrom calls. You definitely don't need to call > PQconsumeInput if PQisBusy is already saying that a result is available, > and in single-row mode it's likely that several results can be consumed > per recvfrom call. This makes sense and, with some refactoring of our fetch loop, the overhead of using single-row mode is now down to about 3x, likely caused by the greater overhead in Python calls. Please note that the insight you gave in your answer seems to contradict the documentation. Some excerpts of https://www.postgresql.org/docs/current/libpq-async.html: """ PQconsumeInput: "After calling PQconsumeInput , the application can check PQisBusy and/or PQnotifies to see if their state has changed" PQisBusy: "will not itself attempt to read data from the server; therefore PQconsumeInput must be invoked first, or the busy state will never end." ... A typical application [will use select()]. When the main loop detects input ready, it should call PQconsumeInput to read the input. It can then call PQisBusy, followed by PQgetResult if PQisBusy returns false (0). """ All these indications give the impression that there is a sort of mandatory order, requiring to call first PQconsumeInput, then PQisBusy. As a consequence, the core of our function to fetch a single result was implemented as: ``` def fetch(pgconn): while True: pgconn.consume_input() if not pgconn.is_busy(): break yield Wait.R return pgconn.get_result() ``` (Where the `yield Wait.R` suspends this execution to call into select() or whatever waiting policy the program is using.) Your remarks suggest that PQisBusy() can be called before PQconsumeInput(), and that the latter doesn't need to be called if not busy. As such I have modified the loop to be: ``` def fetch(pgconn): if pgconn.is_busy(): yield Wait.R while True: pgconn.consume_input() if not pgconn.is_busy(): break yield Wait.R return pgconn.get_result() ``` which seems to work well: tests don't show regressions and single-row mode doesn't waste recvfrom() anymore. Is this new fetching pattern the expected way to interact with the libpq? If so, should we improve the documentation to suggest that there are reasons to call PQisBusy before PQconsumeInput? Especially in the single-row mode docs page, which doesn't make relevant mentions to the use of these functions. Thank you very much for your help, really appreciated. -- Daniele
Libpq single-row mode slowness
Hello, a Psycopg 3 user has tested what boils down pretty much to a "generate_series(100K)" and has reported a 100x difference between fetching it normally and in single-row mode. I have repeated the test myself and I have found a 50x difference (the OP doesn't specify which platform is using, mine is Linux). Of course calling PQconsumeInput 100K times has some overhead compared to calling it only once. However, I would like to know if this level of overhead is expected, or if instead anyone smells some wasted cycles. According to some profiling, the largest part of the time is spent inside a libc function I don't know the symbol of, called by pqReadData(). Details and pretty graphs are available at https://github.com/psycopg/psycopg/issues/286 The operations we perform, for every row, are PQconsumeInput, PQisBusy, PQgetResult. Every PQconsumeInput results in a recvfrom() syscall, of which the first one returns the whole recordset, the following ones return EAGAIN. There are two extra cycles: one to get the TUPLES_OK result, one to get the end-of-stream NULL. It seems the documented usage pattern, but I'm not sure if I'm not misreading it, especially in the light of this libpq grumble [1]. [1] https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L681 Our connection is in non-blocking mode and we see the need for waiting (using epoll here) only on the first call. The resulting strace of the entire query process (of two rows) is: 21:36:53.659529 sendto(3, "P\0\0\0>\0select 'test' a, t b from "..., 108, MSG_NOSIGNAL, NULL, 0) = 108 21:36:53.660236 recvfrom(3, 0x1f6a870, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 21:36:53.660589 epoll_create1(EPOLL_CLOEXEC) = 4 21:36:53.660848 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT, {u32=3, u64=3}}) = 0 21:36:53.661099 epoll_wait(4, [{EPOLLIN, {u32=3, u64=3}}], 1023, 10) = 1 21:36:53.661941 recvfrom(3, "1\0\0\0\0042\0\0\0\4T\0\0\0.\0\2a\0\0\0\0\0\0\0\0\0\0\31\377\377\377"..., 16384, 0, NULL, NULL) = 117 21:36:53.662506 close(4)= 0 21:36:53.662830 recvfrom(3, 0x1f6a898, 16344, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 21:36:53.663162 recvfrom(3, 0x1f6a884, 16364, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 21:36:53.663359 recvfrom(3, 0x1f6a876, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) The test is on a localhost connection with sslmode disabled using libpq 14.2. Is this the correct usage? Any insight is welcome. Thank you very much! -- Daniele
Is it safe to use the extended protocol with COPY?
Hello, in psycopg 3 we are currently using PQexecParams - although with no params - to send COPY commands. The reason is mostly to avoid people to send COPY together with other statements. Especially if other operations are chained after COPY: we would only notice them after copy is finished. Data changes might have been applied by then, so throwing an exception seems impolite (the result might have been applied already) but managing the result is awkward too. Someone [1] has pointed out this conversation [2] which suggests that COPY with extended protocol might break in the future. [1] https://github.com/psycopg/psycopg/issues/78 [2] https://www.postgresql.org/message-id/flat/CAMsr%2BYGvp2wRx9pPSxaKFdaObxX8DzWse%2BOkWk2xpXSvT0rq-g%40mail.gmail.com#camsr+ygvp2wrx9ppsxakfdaobxx8dzwse+okwk2xpxsvt0r...@mail.gmail.com As far as PostgreSQL is concerned, would it be better to stick to PQexec with COPY, and if people append statements afterwards they would be the ones to deal with the consequences? (being the server applying the changes, the client throwing an exception) Thank you very much -- Daniele
Re: Is a connection max lifetime useful in a connection pool?
On Sun, 21 Feb 2021 at 19:12, Pavel Stehule wrote: > I have very strong experience - it is very useful. On Sun, 21 Feb 2021 at 19:26, Stephen Frost wrote: > Short answer- yes. Sounds good. Thank you very much for your insight! -- Daniele
Is a connection max lifetime useful in a connection pool?
Hello, I am designing and implementing a connection pool for psycopg3 [1][2]. Some of the inspiration is coming from HikariCP [3], a Java connection pool. One of the HikariCP configuration parameters is "maxLifetime", whose description is: "This property controls the maximum lifetime of a connection in the pool. [...] **We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit.**" (bold is theirs, default value is 30 mins). When discussing the pool features in the psycopg mailing list someone pointed out "what is the utility of this parameter? connections don't rot, do they?" Hikari is a generic connection pool, not one specific for Postgres. So I'm wondering: is there any value in periodically deleting and recreating connections for a Postgres-specific connection pool? Is a MaxLifetime parameter useful? Thank you very much, -- Daniele [1]: https://www.psycopg.org/articles/2021/01/17/pool-design/ [2]: https://github.com/psycopg/psycopg3/blob/connection-pool/psycopg3/psycopg3/pool.py [3]: https://github.com/brettwooldridge/HikariCP
Types info on binary copy
Hello, The PGresult structure returned on COPY ... FROM provides a handful of information (https://www.postgresql.org/docs/13/libpq-copy.html): number of fields expected and whether they are binary or not. However it doesn't provide the types expected. For binary copy the exact type is important: playing with it, it seems that no cast whatsoever is applied and the errors reported are relatively low level. For instance, passing an int4 value to an int8 field causes a "protocol_violation: insufficient data left in message"; the other way around is greeted with "invalid_binary_representation: incorrect binary data format". Naively, it would seem that once a "COPY ... FROM STDIN" is executed successfully, the server has a pretty good idea of what data types it is expecting. I'm wondering: is it absurd to ask for this info to be returned as RowDescription and to be exposed by the libpq PQftype()? Cheers -- Daniele
Use of pager, help, localization:
>From psql/help.c /* * Keep this line count in sync with the number of lines printed below! * Use "psql --help=variables | wc" to count correctly; but notice that * Windows builds currently print one more line than non-Windows builds. * Using the larger number is fine. */ output = PageOutput(156, pager ? &(pset.popt.topt) : NULL); uhm... this doesn't work at all with i10n, above all if respecting an 80 cols width. Wouldn't be better to generate the output in a temp buffer after l10n, count the line, then choose whether to use a pager or not? Can provide a patch if you deem so. -- Daniele