On Tue, Aug 02, 2022 at 01:42:43PM +0000, Job Snijders wrote:
> Hi,
>
> We were doing a lot of waiting in connect() for some (currently) broken
> repositories. Move on with life after MAX_CONTIMEOUT seconds.
>
> This changeset reduces the real time spent fetching the RIPE TAL (with
> hot cache) from 7m14.57s to 3m20.10s on an IPv4+IPv6 dual-stacked host.
>
> This changeset depends on
> https://marc.info/?l=openbsd-tech&m=165944397404967&w=2
>
> Kind regards,
>
> Job
>
> Index: extern.h
> ===================================================================
> RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
> retrieving revision 1.143
> diff -u -p -r1.143 extern.h
> --- extern.h 27 Jun 2022 10:18:27 -0000 1.143
> +++ extern.h 2 Aug 2022 13:21:19 -0000
> @@ -727,6 +727,9 @@ int mkpathat(int, const char *);
> #define MAX_HTTP_REQUESTS 64
> #define MAX_RSYNC_REQUESTS 16
>
> +/* How many seconds to wait for a connection to succeed. */
> +#define MAX_CONTIMEOUT 15
> +
What makes you think that 15sec is enough to open connections in all
scenarios? I feel this is one of those changes that just shows that maybe
the current connect timeout from the system is too conservative.
> /* Maximum allowd repositories per tal */
> #define MAX_REPO_PER_TAL 1000
>
> Index: http.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 http.c
> --- http.c 24 May 2022 09:22:45 -0000 1.62
> +++ http.c 2 Aug 2022 13:21:20 -0000
> @@ -801,6 +801,9 @@ static enum res
> http_connect(struct http_connection *conn)
> {
> const char *cause = NULL;
> + struct pollfd pfd;
> + socklen_t optlen;
> + int optval, r;
>
> assert(conn->fd == -1);
> conn->state = STATE_CONNECT;
> @@ -834,10 +837,17 @@ http_connect(struct http_connection *con
> }
> }
>
> - if (connect(conn->fd, res->ai_addr, res->ai_addrlen) == -1) {
> - if (errno == EINPROGRESS) {
> - /* wait for async connect to finish. */
> - return WANT_POLLOUT;
> + if ((r = connect(conn->fd, res->ai_addr, res->ai_addrlen))
> + != 0 && errno == EINPROGRESS) {
> + pfd.fd = conn->fd;
> + pfd.events = POLLOUT;
> + if ((r = poll(&pfd, 1, MAX_CONTIMEOUT * 1000)) == 1) {
> + optlen = sizeof(optval);
> + if ((r = getsockopt(conn->fd, SOL_SOCKET,
> + SO_ERROR, &optval, &optlen)) == 0) {
> + errno = optval;
> + r = optval == 0 ? 0 : -1;
> + }
> } else {
> save_errno = errno;
> close(conn->fd);
This is not correct. You can't just poll for this connection and have all
other http connection wait until that is done. You need to handle this in
the main poll loop.
> Index: rsync.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/rpki-client/rsync.c,v
> retrieving revision 1.38
> diff -u -p -r1.38 rsync.c
> --- rsync.c 24 May 2022 09:20:49 -0000 1.38
> +++ rsync.c 2 Aug 2022 13:21:20 -0000
> @@ -312,6 +312,7 @@ proc_rsync(char *prog, char *bind_addr,
> args[i++] = "-rt";
> args[i++] = "--no-motd";
> args[i++] = "--max-size=" STRINGIFY(MAX_FILE_SIZE);
> + args[i++] = "--contimeout=" STRINGIFY(MAX_CONTIMEOUT);
> args[i++] = "--timeout=180";
> args[i++] = "--include=*/";
> args[i++] = "--include=*.cer";
>
--
:wq Claudio