Re: Viewport for man.openbsd.org -- readability on phones
On Thu, 17 May 2018 18:32:44 -0400 Aner Perezwrote: > First non-comment line of mandoc.css says: > > html {max-width: 100ex; } > > Removing this line allows the use of the full browser width. I'm > sure that it was put there for a reason (maybe to approximate the > width of a terminal?). Some browsers simply don't calculate lengths expressed in exes correctly -- seen that in many other contexts. Last time I checked (about 3 years ago, so it might well have changed since), two of the four most common browsers still exhibited that fault. As a quick experiment, try looking up the metrics of the font your browser actually uses to render man pages, then convert 100ex into ems for your font and put the result in the max-width attribute in your local copy of mandoc.css. If that fixes your width issue then you'll have clear evidence that the bug lies in the browser (specifically in its routine for converting exes to whatever its native display length unit is).
Re: httpd stops accepting connections after a few hours on current
On Wed, 2015-07-15 at 21:41 +0930, Jack Burton wrote: The fix is trivial -- see attached patch (against 5.7-stable -- sorry, I don't have any hosts running -current at present). ... [demime 1.01d removed an attachment of type text/x-patch which had a name of httpd_server_accept_tls.patch; charset=UTF-8] Sorry, didn't realise I couldn't post a patch to the misc@ (I've never needed to before). Please excuse my ignorance, but what is the accepted way to contribute a patch?
Re: httpd stops accepting connections after a few hours on current
On Mon, 2015-07-13 at 16:19 +0200, Tor Houghton wrote: On Mon, Jul 13, 2015 at 10:52:46PM +0930, Jack Burton wrote: I don't pretend to know httpd (at all), but I'm wondering, what should fstat(1) say, over time, for the httpd processes? Thanks Tor -- that was exactly the clue I needed to isolate the problem. [snip] admin talks to a custom FastCGI daemon, which is most likely the culprit -- I'll debug it tomorrow. ... I am not sure you should conclude yet. I don't use FastCGI. ;-} Now, as I write, I have 218 open fd's, compared to the 206 or whatever I had in my previous post. I've got a few dangling :443 streams (the :80 ones seem to disappear like they should), and then a bunch of these: You're absolutely right -- I spoke too soon. After double-checking that every possible path a request could take through the custom FastCGI daemon used by admin ends by sending an FCGI_END_REQUEST record back to httpd (it does), I turned my attention back to the httpd logs debug messages gathered. This time I had my little script check the remote IP addresses of those socket against all the httpd access logs (not just the current ones) and where nothing matched there, finally check the httpd debug output too. Again, only the admin server (the only one here that's Internet-facing) had stale sockets (all open sockets for redir portal matched log entries) -- out of 26 open sockets, 4 matched log entries for current HTTPS sessions, 2 matched buffer event error debug messages and the other 20 didn't match in either the logs or debug messages. I still don't know what's causing the buffer event error messages, but as they accounted for only 2 of the 22 stale sockets, I figured it was more important to focus on the other 20 first. So, what sort of HTTPS event doesn't make it into the logs and doesn't cause any debug messages containing the remote IP address to be emitted either? The only thing I could think of was a TCP connection to port 443 where the remote end doesn't initiate a TLS handshake (that's nowhere near as improbable as it sounds: think a simple port scan, or a network outage commencing directly after the first ACK). So, as a test I tried just that: establishing a TCP session from a remote host then closing it without sending anything at all at layer 5. Naturally, doing that where httpd expects plain HTTP causes only a single debug message to be emitted (...done), and the socket gets closed as expected. But doing it where httpd expects HTTPS and the local side of the socket remains open, nothing appears in the regular logs, and nothing identifiable by remote IP address appears in debug messages either. Trying to match log/debug entries that aren't identified by the remote IP address on a host with even a modest amount of traffic struck me as an exercise in futility, so I tried the same experiment on another host (also running 5.7-stable) with no other load on httpd at all. Result was the same: httpd did not close the socket or log anything in the regular logs. However, one debug message was emitted, our old friend server_accept_tls: TLS accept failed - (null)... ...which brings us right back to where this thread started. Looking at the source, server_accept_tls() handles two types of non-recoverable error condition: timeout after retry and outright failure. In the first case (EV_TIMEOUT), server_accept_tls() calls server_close() (which in turn calls server_close_http(), which closes the socket) before returning; in the second case it does not. I believe this is the bug we've been looking for. The fix is trivial -- see attached patch (against 5.7-stable -- sorry, I don't have any hosts running -current at present). That works for me (tested here on two hosts: sparc64 with test load only; and amd64 with modest production load). Not sure if that's the best approach or not, but now that we've at least established root cause, if there's a better way I'm sure someone else on the list will point it out. [demime 1.01d removed an attachment of type text/x-patch which had a name of httpd_server_accept_tls.patch; charset=UTF-8]
Re: httpd stops accepting connections after a few hours on current
On Wed, 2015-07-15 at 12:56 +, Mike Burns wrote: On 2015-07-15 21.49.11 +0930, Jack Burton wrote: Sorry, didn't realise I couldn't post a patch to the misc@ (I've never needed to before). Please excuse my ignorance, but what is the accepted way to contribute a patch? Post it to tech@ . Done. See post to tech@ titled httpd: patch to close TLS sockets that fail before TLS handshake.
Re: httpd stops accepting connections after a few hours on current
On Mon, 2015-07-13 at 11:02 +0200, Tor Houghton wrote: On Sun, Jul 12, 2015 at 07:56:37PM +0930, Jack Burton wrote: It is possible I simply failed to provision sufficient capacity -- which could easily be fixed by adding a login class for www with a higher limit on open fds -- but I fear that might just be hiding the problem rather than addressing it: exhausting a 512 fd limit with with peak load of only 48 req/sec (and average load of 2 req/sec) just doesn't feel right (especially when that peak load is all 303s generated internally by httpd, which each take only a tiny fraction of a second to process). I don't pretend to know httpd (at all), but I'm wondering, what should fstat(1) say, over time, for the httpd processes? Thanks Tor -- that was exactly the clue I needed to isolate the problem. Wrote a short script to parse the output of running fstat -p for each running httpd (we're running with prefork 8, so I didn't fancy doing it by hand), and report the timestamp of the last request in the relevant access log of each client IP with an open socket (or 'missing' if no entry in the current access log). Ran it roughly 4 hours after the last log rotation and found only 34 matches out of 73 open sockets. We don't run anything here that would take anywhere near 4 hours to return a response, so the 39 that didn't match entries in any of the current access logs were clearly where I needed to look. All 39 related to admin -- the one HTTPS server that I hadn't spent any time looking into (since it accounts for only 0.02% of httpd's load here, it didn't occur to me that that tiny little thing could be bringing httpd to its knees ... famous last words). admin talks to a custom FastCGI daemon, which is most likely the culprit -- I'll debug it tomorrow. portal (the other HTTPS server) also talks to a (different) custom FastCGI daemon, but carries orders of magnitude more traffic and didn't have any stale sockets -- so clearly our problem is at the other end of admin's FastCGI socket (not with httpd itself). Sorry for the noise. Ted -- similarly, you may want to look into whatever is at the other end of your server1's FastCGI socket. If your issue is the same as ours, that's likely where you'll find the cause.
Re: httpd stops accepting connections after a few hours on current
On Sat, 2015-07-11 at 15:38 +0930, Jack Burton wrote: It hasn't happened here in a few days now so I don't have a log extract on hand to share (but can post one next time it happens). Okay, the issue returned this afternoon and the httpd debug output certainly sheds more light on the problem. This time we didn't see either the TLS or buffer event errors anywhere near the time at which httpd stopped responding to requests. Instead, we're getting server_accept: deferring connections. According to the comments in server.c, that means we're running out of file descriptors. That struck me as odd, as our traffic generally isn't anywhere near high enough to expect that, so I checked the traffic at the time and there was indeed a spike although it didn't seem high enough to cause issues. Peak load was 48 requests in the one second before httpd stopped responding to requests. All 48 of those requests were to the trivial http server, whose config is just: listen on $int_addr port 80 block return 303 https://portal.tvir.acscomp.net; (yes I know that that hostname doesn't resolve publicly -- but it does when using the resolver assigned by dhcp on the semi-public [but not Internet-facing] network on which our httpd listens) As an aside, I didn't see in the debug output any requests during that final second [although there were two a couple of seconds later] to the target https server portal (which is served by the same instance of httpd) -- but I guess it's possible that all 48 clients either didn't act on the 303 or already had its target in their caches (environment is a residential building for tertiary students, so the user base is fairly static at this time of year -- so seems well within the realms of possibility that all 48 had / on portal cached). Debug output at the time httpd stopped responding reads (after 47 other requests to the trivial http server all timestamped 16:08:54): redir 192.168.137.160 - - [12/Jul/2015:16:08:54 +0930] GET /personal HTTP/1.1 303 0 server redir, client 119933 (505 active), 192.168.137.160:40521 - 192.168.137.1, https://portal.tvir.acscomp.net (303 See Other) server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server redir, client 119935 (505 active), 192.168.137.160:45643 - 192.168.137.1, done server redir, client 119934 (504 active), 192.168.137.160:40526 - 192.168.137.1, done server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server redir, client 119936 (505 active), 192.168.137.160:47925 - 192.168.137.1, done server_accept: deferring connections server_accept: deferring connections server redir, client 119938 (505 active), 192.168.137.160:40528 - 192.168.137.1, done server redir, client 119937 (504 active), 192.168.137.160:40527 - 192.168.137.1, done server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server redir, client 119940 (505 active), 192.168.137.160:37213 - 192.168.137.1, done server_accept: deferring connections server_accept: deferring connections portal.tvir.acscomp.net 192.168.137.99 - - [12/Jul/2015:16:08:56 +0930] GET / HTTP/1.1 200 0 server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections server_accept: deferring connections portal.tvir.acscomp.net 192.168.137.112 - - [12/Jul/2015:16:08:57 +0930] GET / HTTP/1.1 200 0 server_accept: deferring connections Then nothing but server_accept: deferring connections over and over again. It is possible I simply failed to provision sufficient capacity -- which could easily be fixed by adding a login class for www with a higher limit on open fds -- but I fear that might just be hiding the problem rather than addressing it: exhausting a 512 fd limit with with peak load of only 48 req/sec (and average load of 2 req/sec) just doesn't feel right (especially when that peak load is all 303s generated internally by httpd, which each take only a tiny fraction of a second to process). I notice in the source that server_close_http() is responsible for freeing session-specific fds, and that it's called from server_close(), which is also responsible for generating the ..., done debug messages and decrementing the active client count. We're only seeing those ..., done messages in the debug output for a small proportion of completed HTTP sessions, and the active client count continues to grow (and only falls occasionally), even when there is much less HTTP traffic. Is seems as if some HTTP sessions get their fds freed on completion while others don't ... but I can't find anything in the source to support that conjecture. Could someone who's more familiar with httpd than I am offer a clue please?
Re: httpd stops accepting connections after a few hours on current
On Thu, 2015-07-09 at 11:59 +0200, Tor Houghton wrote: On Wed, Jul 08, 2015 at 10:04:27PM -0500, Theodore Wynnychenko wrote: [snip] server https://server2.tldn.com, client 2067 (63 active), 10.0.28.254:60330 - 10.0.28.130:443, buffer event error [..] server https://server2.tldn.com, client 2068 (63 active), 10.0.28.254:52350 - 10.0.28.130:443, buffer event error I'm going to me too on this one (have not been until now, as I thought perhaps it was due to my setup, and therefore off-topic). Likewise, seeing the same behaviour here on 5.7-stable -- so the problem is not confined to -current. Fairly small simple httpd setup here, httpd configured with 3 server stanzas: 2 HTTPS-only (both using FastCGI) plus one trivial HTTP-only (just a block return 303 pointing to one of the HTTPS servers). Quite a light load too (averaging 178k requests/day -- about 2/sec). Frequency of problem varies wildly -- sometimes occurs after only an hour or two since last httpd restart and at other times httpd will last for up to 4 days before it stops responding to requests. Variation in volume of requests appears to have no effect on frequency of recurrence either. On every occasion, httpd continues to respond correctly to signals (httpd restarts are always clean), just not to HTTP[S] requests. On at least one occasion, the http socket continued to respond correctly to requests, whilst the two https ones stopped responding. On other occasions, all 3 stopped responding at around the same time. When a socket stops responding, it still accepts requests but httpd neither logs (at least, when not in debug mode) nor responds to them (i.e. I can successfully open a TCP session to the listening socket and send it a request, but nothing comes back after the initial ACK). It hasn't happened here in a few days now so I don't have a log extract on hand to share (but can post one next time it happens). From memory in the past we were seeing TLS accept fail errors in the logs, as reported by the original poster, but not at the time the sockets stopped responding (only well beforehand), so I'd also assumed that those were unrelated. Running tcpdump on both user-facing interfaces (and on pflog0 just to rule out the possibility of some error in our pf.conf) whilst httpd was not responding to requests on previous occasions revealed nothing new. Have tried watching debug output a couple of times before, but it rapidly gets quite unwieldy, even with our modest load (especially over a remote ssh session -- both uplinks at that site are nearing capacity), given the length of time it can take for the problem to manifest (on each occasion I gave up after a few hours without the problem occurring). Am now running httpd -dvvv with stdout/err redirected to a temporary log file (probably should have done that in the first place). We are already seeing (after less than a minute) entries in the debug logs similar to those reported by Theodore, for example: * On an HTTPS server (using FastCGI): server portal, client 305 (14 active), 192.168.137.161:52224 - 192.168.137.1:443, buffer event error and * On the trivial HTTP server (using just a block return 303): server redir, client 132 (11 active), 192.168.137.100:61081 - 192.168.137.1, buffer event timeout However, the original problem (httpd stops responding to requests) is *not* occurring at present. Will post debug log extract httpd.conf next time the problem recurs (should be within the next few days).