On Tue, Mar 4, 2025 at 2:44 PM Jacob Champion <jacob.champ...@enterprisedb.com> wrote: > Maybe. My first attempt gets all the BSDs green except macOS -- which > now fails in a completely different test, haha... -_-
Small update: there is not one bug, but three that interact. ಠ_ಠ 1) The test server advertises an issuer of `https://localhost:<port>`, but it doesn't listen on all localhost interfaces. When Curl tries to contact the issuer on IPv6, its Happy Eyeballs handling usually falls back to IPv4 after discovering that IPv6 is nonfunctional, but occasionally it contacts something that was temporarily listening there instead. Since I don't really want to write a bunch of IPv6 fallback code for the test server -- this should be testing OAuth, not finding all the ways that buildfarm OSes can expose dual stack sockets -- I changed the issuer to be IPv4-only. When I did this, the interval timing tests immediately failed on macOS. 2) macOS's EVFILT_TIMER implementation seems to be different from the other BSDs. On Mac, when you re-add a timer to a kqueue, any existing timer-fired events for it are not cleared out and the kqueue might remain readable. This breaks a postcondition of our set_timer() function, which is that new timeouts are supposed to completely replace previous timeouts. With a dual stack issuer, the Happy Eyeballs timeouts would be routinely cleared out by libcurl, setting up a clean slate for the next call to set_timer(). But with an IPv4-only issuer, libcurl didn't need to clear out the timeouts (they'd already fired), which meant that our call to set the ping interval was ineffective. 3) There is a related performance bug on other platforms. If a Curl timeout happens partway through a request (so libcurl won't clear it), the timer-expired event will stay set and CPU will be burned to spin pointlessly on drive_request(). This is much easier to notice after taking Happy Eyeballs out of the picture. It doesn't cause logical failures -- Curl basically discards the unnecessary calls -- but it's definitely unintended. -- Problem 1 is a simple patch. I am working on a fix for Problem 2, but I got stuck trying to get a "perfect" solution working yesterday... Since this is a partial(?) blocker for getting NetBSD going, I'm going to pivot to an ugly-but-simple approach today. I plan to defer working on Problem 3, which should just be a performance bug, until the tests are green again. And I would like to eventually add some stronger unit tests for the timer behavior, to catch other potential OS-specific problems in the future. Thanks, --Jacob