This is my personal attempt to flesh out some of the things we discussed at VDD25Q2 in a bit more detail.
A) More modular VCL ------------------- Points of pain: "Having everyting in one VCL file" "Slow VCL compiles" "Useless backend state/statistics reporting." A) Diagnosis: In complex setups, you either end with a tangled VCL file that does many different things in a lot of conditional clauses, or you end up with a less tangled VCL file that tries to determine which of multiple VCL files should handle this particular request. If you do the latter, you have to repeat the backend declarations in many of the VCL files which causes fragmented backend statistics. A) Concrete proposals: A.1) Make it possible to import and export backends(=directors) between VCLs. To do this, we must discard one original dogma: "There is *exactly* one active VCL at any moment in time. We did that for good reasons, I will argue that it allowed us to deliver the very valuable and successful feature of "truly instant reconfiguration", but on review, it is now a limiting factor. Strictly speaking we already broke that dogma with return(vcl) but we hid that so well, that we did not have to change a single word in the documentation. Letting it go (more) has consequences, most obviously, we will need some way to decide which of multiple active VCL's we throw the incoming requests at, but as long as "the other active VCLs" do not contain a vcl_recv{}, that is obvious. Sharing backends/directors and ACLs across VCLs means we need some way to make sure all other threads from other VCLs are out of this one, before we can cool and unload it. That is CS-101 stuff multi- thread material, but performance cannot be ignored. But the immediately obvious follow-up question is: Why can't I also export&import SUBs ? I wont go into the details (compatibility with the vcl_method they are called from), but that runs into an equally old dogma: "If you can vcl.load a VCL, you can vcl.use that VCL." This one already has a footnote attached to it, relating to VMODs being able to veto going from cold to hot, but otherwise it still holds. This originated in a desire to have a preloaded, ever-ready "emergency VCL" so that when the newspaper backend monster keeled over, there would be a single /reliable/ switch to throw. Is that "killer-feature" or "really, I didn't know that..." ? Right now I truly dont know the answer, so for that reason alone sharing SUBs is "desirably but for further study" at this time, So for now: I think we should implement export/import of backends and ACLs, since I think they "come for free", but not commit to sharing SUBs. (See in A.2 for CLI implications.) A.1) Thoughts about implementation Exporting things must be explicit, we do not want VCLs to be able to grab random stuff from other VCLs, both as a matter of sanity, and to keep the list of exported object small. For the same reasons as for return(vcl), the imports have to go through labels, otherwise "the other" VCL cannot be replaced. Exporting the backends from a single VCL, instead of replicating their definition in new versions of the active VCL or in multi-app/tennant VCLs, means that the statistics and state will not be fragmented. We may want more, (see below,) but it will be a step in the right direction. A.1) Summary: Low to medium complexity, good and concrete benefits which would be a selling point for 8.0. A.2) Add a central switchboard. I think the final version of the idea we came up with, was something like this mock-up: vcl 4.2; vcl_scope { req.http.host suffix "example.com"; req.url prefix "/hello_world"; return(mine(100)); } These "selectors" will be merged into a single decision data structure which a central dispatcher uses to decide where the requests goes. I think we also had consensus for adding an escape mechanism along the lines of: vcl 4.2; sub vcl_match { if (client.ip ~ inhouse_acl && req.url ~ "editor") { return (mine(100)); } return (notme); } Such functions cannot be merged, but must be executed serially, which rules them out as the only method, but there seems to be solid use-cases for having a few, for instance purges, inhouse vs. outside, log4j detection etc. So far, so good. We need CLI commands to do this, including a "vcl.unuse" which we never had before and a "vcl.substitute" to atomically do a vcl.unuse + vcl.use. If we're adding two new CLI commands, we gain nothing from overloading "vcl.use" as the third, so we should add three all new CLI, something like: vcl.insert - add a vcl to the switchboard vcl.remove - remove a vcl from the switchboard vcl.replace - atomic add+remove That eliminates the need for a setting to enable this new "switchboard mode": We power up the swichboard on first vcl.insert and power it down onlast vcl.remove. That again means that even people who do not use the switchboard would be able to "vcl.insert log4j_mitigator.vcl" without editing their VCL. (killer-feature ?) But that only works if the switchboard defaults to their usual VCL, when none of the vcl.insert'ed VCL's match. So I think the final result looks like: There is *exactly* one active VCL at any moment in time, requests go here, unless the switchboard dispatches them, (But "active" now means something slightly different.) There can be any number of "library VCLs" loaded with "vcl.library" containing only backend/directors and ACLs (for now). There can be any number of "subscriber VCLs" loaded with "vcl.insert" which go though the switchboard. A.2) Thoughts about implementation How are conflicting selectors resolved ? In the above examples I put "mine(100)" as a way to assign priorities. Better ideas ? I'm slightly concerned about the rebuilding/reconfiguration of the merged decision data structure when there are many VCL's. Nobody argued for using regular expressions, which I suspect was partly a healthy respect for implementing the merge, and partly because those fields are not just strings (%xx, case-insensitity, I18N DNS etc.) It seem obvious to allow multiple selectors on each of the two fields, and to give them "or" semantic, so that a single vcl_scope{} can match multiple domains and/or multiple urls. But assuming the two fields (host+url) inside the selector have "and" semantics, I think we should also allow multiple vcl_scope{} per VCL, so that a single VCL can handle: vcl_scope { req.http.host suffix "example.com"; req.url prefix "/hello_world"; return(mine(100)); } vcl_scope { req.http.host suffix "examplè.fr"; req.url prefix "/bonjour_monde"; return(mine(100)); } vcl_scope { req.http.host suffix "example.de"; req.url prefix "/guten_heute_leute"; return(mine(100)); } and if that still cannot do what people want, there is the vcl_match{} escape-mechanism. I wonder if host+url is too restricive? I can imagine, but dont know the relevance of, also selecting on user-agent and particular cookies being present or absent, but with the escape-mechanism, we can collect real-world experience before we decide that. A.2) Summary: This one goes all over the place, VCC, CLI, locking, and using somebody's exam-results in CS date structures in real life. I cant imagine this is realistic for 8.0, and I dont see any ways to be "a little bit pregnant" with it. But if my outline holds up to scrutiny, it is additive and will not have to wait for 9.0. B) "Plain backends are too plain" --------------------------------- Points of pain: "DNS answers with multiple IPs" "DNS response frozen at vcl.load time" "Probing backends with rapidly changing IPs" "Fragmented (connection pool) statistics" B) Diagnosis: In 2006 backends were real backends and Kubernetes was not a real word. Until we have a "discover" service which checks if the DNS response has changed - we are stuck with freezing the DNS response at vcl.load time. But we could stop being anal about DNS responses with multiple IP numbers, which would at least allow people to work around that limitation by reloading their VCL every N minutes. B) Concrete proposals: Have VCC accept DNS responses with multiple IPs, use them round-robin. B) Thoughts about implementation I'm not sure "use them round-robin" cuts it. For instance if we get both IP4 and IP6 but have no IP6 connectivity. A better default policy may be "once you find one that works, stick with it, until it stops working. Do we probe all the IP's ? Should we compile it into a round-robin director, to avoid code duplication ? B) Summary: Once the questions are answered, this should be pretty straight forward, and not be difficult to complete before 8.0. (Famous Last Words™) C) VSL roll-overs ----------------- Points of pain: "Extra memory copies in clients to 'evacuate' requests in danger of being overwritten" "Complexity in clients to monitor danger of overwrites." C) Diagnosis In 2006 wire-speed was 100 Mbit/sec, and if your VSL clients were not fast enough, that was not our problem. C) Concrete proposal Instead of one big SHM segment, varnishd creates N distinct files which occupy the same amont of space, and announces them in the VSM. Varnishd picks an available segment and updates it's open and "do not use past" timestamps in VSM. When that segment is full, repeat the process. VSL-Clients monitor the index and process the files in timestamp sequence. When the client opens a segment, it links a unique filename to that file, so the inode link-count increases, and it removes that filename again when the client no longer needs any data in that segment. Clients should arm "atexit(3)" handlers to nuke the unique filenames when they end. Varnishd considers a segment available if it's previous "do not use past" timestamp is expired, and the inode link count is one. C) Thoughts about implementation This proposal eliminates VSL overwrites entirely, but adds some new failure modes: VSL-Client dies without removing the timestamp which holds the inode link, leaving segment(s) locked until those stray files are removed. If the VSL-Clients unique names are preditcable from their PID, varnishd could patrol such files with kill(0). When clients are too slow or get stuck, varnishd may run out of available segments, and varnishd will serve traffic without logging it. Counters should record how many transactions and VSL records were not written, and the VSM needs to communicate to clients that there is a hole in the VSL stream, otherwise the clients may never release the prior segments they hold on to. A parameter can change the default, so varnishd instead stops serving traffic if it cannot be logged. Here the "do not use past" timestamp can be used as configurable minimum duration of VSL "look-back". The inode link-count trick is neat, but involving the filesystem and that may be too expensive. Once VIPC is in, we can use that and eliminate the "stray files" problem. In light of the "make all cli JSON" discussion, maybe this should be the first customer of VIPC ? C) Summary Very limited amount of code involved, this might make it into 8.0. Feedback kindly requested... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. _______________________________________________ varnish-dev mailing list varnish-dev@varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev