To revive this old thread... On Sep 5, 2012, at 9:35 PM, Asher Feldman <[email protected]> wrote:
> On Tue, Sep 4, 2012 at 3:11 PM, Platonides <[email protected]> wrote: > >> On 03/09/12 02:59, Tim Starling wrote: >>> I'll go for option 4. You can't delete the images from the backend >>> while they are still in Squid, because then they would not be purged >>> when the image is updated or action=purge is requested. In fact, that >>> is one of only two reasons for the existence of the backend thumbnail >>> store on Wikimedia. The thumbnail backend could be replaced by a text >>> file that stores a list of thumbnail filenames which were sent to >>> Squid within a window equivalent to the expiry time sent in the >>> Cache-Control header. >>> -- Tim Starling >> >> The second one seems easy to fix. The first one should IMHO be fixed in >> squid/varnish by allowing wildcard purges (ie. PURGE >> /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0) > fast.ly implements group purge for varnish like this via a proxy daemon > that watches backend responses for a "tag" response header (i.e. all > resolutions of Tim_starling.jpg would be tagged that) and builds an > in-memory hash of tags->objects which can be purged on. I've been told > they'd probably open source the code for us if we want it, and it is > interesting (especially to deal with the fact that we don't purge articles > at all of their possible url's) albeit with its own challenges. If we > implemented a backend system to track thumbnails that exist for a given > orig, we may be able to remove our dependency on swift container listings > to purge images, paving the way for a second class of thumbnails that are > only cached. How about this idea: Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list. But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache. This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure. Of course this won't work for Squid, but I'm pretty close to being able to replace Squid by Varnish entirely for upload. -- Mark Bergsma <[email protected]> Lead Operations Architect Wikimedia Foundation _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
