Re: [Wikitech-l] is this how our thumbnail caching works?

2014-05-15 Thread Sumana Harihareswara
Bryan's made some updates to the docs! Here's what Bryan just said to me
(forwarded with permission):

 I put the core of my narrative on wikitech [0] along with the diagram.
 The page I put that on could use a little more attention to ensure
 that references to pmtpa are gone. It might be good to remove the
 discussion of ceph as well since that migration (swift to ceph) was
 cancelled and isn't likely to start again for a while. I haven't heard
 anything mentioned about it in the last couple of Core quarterly
 reviews.
 
 [0]: 
 https://wikitech.wikimedia.org/wiki/Media_storage#First_request_for_a_thumbnail_image
 
 Bryan

Thanks to everyone who clarified this. I hope this helps move
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
forward!


-- 
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] is this how our thumbnail caching works?

2014-05-14 Thread Bryan Davis
On Tue, May 13, 2014 at 4:13 PM, Sumana Harihareswara
suma...@wikimedia.org wrote:
 I am trying to figure out how thumbnail retrieval  caching works right
 now - with Swift, and the frontline  secondary (frontend and
 backend) Varnishes. (I am working on the caching-related bit of the
 performance guidelines, and want to understand and help push forward on
 https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
 .) I looked for docs but didn't find anything that had been updated this
 year.

I was supposed to document this stuff when I first started with the
Foundation. Unfortunately I never really got it done. I've got some
notes and possibly most helpfully a diagram that I redrew in
Omigraffle based on a diagram that Faidon drew on the wall at the
office for me one day last fall. I've had this sitting around on my
local hard drive for months without uploading it anywhere, so I just
threw it up on mw.o [0].

The diagram shows the major components that you described in your
summary. Traffic from the internet for
http://upload.wikimedia.org/.../some_thumb_url.png hits a front end
LVS which routes to a frontend Varnish server. If the URL is not
cached locally by that Varnish instance, it will compute a hash of the
URL to select the backend Varnish instance that may have the content.
If the backend Varnish doesn't have the content it will request the
thumbnail from the Swift cluster. This request passes through an LVS
that selects a frontend Swift server. The frontend Swift server will
handle the request by asking the backend Swift cluster for the desired
image. If the image isn't found in the backend cluster, the frontend
Swift server will make a request to an image scaler server to have it
created. The image scalers run thumb.php from mediawiki/core.git to
fetch the original image from swift (which goes back to the same LVS
- Swift frontend - Swift backend path as the thumb request came
down). Once the original image is on the image scaler it will run it
through the mime type appropriate scaling software to produce a
thumbnail image. I don't remember if at this point the image is stored
in Swift by the image scaler via thumb.php's internal logic or if that
is handled by the frontend Swift server when it gets the response. In
either case, the newly created thumbnail ends up stored in the Swift
cluster and is returned as the image scaler's http response to the
frontend Swift server handling the original request. The frontend
Swift server in turn returns the thumbnail image to the backend
Varnish server which will cache it locally and then return the image
to the frontend Varnish. Finally the frontend Varnish will cache the
image response in local memory and return the image to the original
requestor.

The next time this exact thumbnail is requested, it may be found in
the frontend Varnish if the LVS routes to the same Varnish and it
hasn't been evicted from the in memory cache by time or the need to
store something newer. The image will stay in the backend Varnish
cache until it ages out based on the response headers or it is evicted
to make room for newer content. In the worst case the thumbnail will
be found in the Swift cluster where 3 copies of the thumbnail file are
stored indefinitely. The only way that the thumbnail will be removed
from Swift is when a new version of the source image is uploaded or
deleted and a purge request is sent out from the wiki.


[0]: https://www.mediawiki.org/wiki/File:Thumbnail-stack.svg
[1]: 
https://wikitech.wikimedia.org/wiki/Swift/Dev_Notes#Removing_NFS_from_the_scalers

Bryan
-- 
Bryan Davis  Wikimedia Foundationbd...@wikimedia.org
[[m:User:BDavis_(WMF)]]  Sr Software EngineerBoise, ID USA
irc: bd808v:415.839.6885 x6855

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] is this how our thumbnail caching works?

2014-05-13 Thread Brian Wolff
On May 13, 2014 7:13 PM, Sumana Harihareswara suma...@wikimedia.org
wrote:

 I am trying to figure out how thumbnail retrieval  caching works right
 now - with Swift, and the frontline  secondary (frontend and
 backend) Varnishes. (I am working on the caching-related bit of the
 performance guidelines, and want to understand and help push forward on

https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
 .) I looked for docs but didn't find anything that had been updated this
 year.

 Here's how I think it works, assuming you are a MediaWiki developer
 who's written, e.g., a page that includes a thumbnail of an image:

 First, your code must get the metadata about the image, which might come
 from the local database, or memcached, or Commons. Then, you need to get
 a thumbnail of the image at the dimensions your page requires. Rather
 than create the thumbnail immediately on demand via parsing the filename
 and dimensions, Wikimedia's MediaWiki is configured to use the 404
 handler. (see [[Manual:Thumb_handler.php]]) Your page first receives a
 URL indicating the eventual location of the thumbnail, then the browser
 asks for that URL. If it hasn't been created yet, the web server
 initially gets an internal 404 error; the 404 handler then kicks off the
 thumbnailer to create the thumbnail, and the response gets sent to the
 client.

 As it is sent to the client, each thumbnail is stored in a Swift store
 and stored in our frontline and secondary Varnish caches.

 (The Varnish caches cache entire HTTP responses, including thumbnails of
 images, frequently-requested pages, ResourceLoader modules, and similar
 items that can be retrieved by URL. The frontline Varnishes keep these
 in memory. (A weighted-random load balancer (LVS) distributes web
 requests to the front-end Varnishes.) But if a frontline Varnish doesn't
 have a response cached, it passes the request to the secondary Varnishes
 via hash-based load balancing (on the hash of the URL). The secondary
 Varnishes hold more responses, storing them ondisk. Every URL is on at
 most one secondary Varnish.)

 So, at the end of this whole process, any given thumbnail is in:
 * the Swift thumbnail store (and will persist until the canonical image
 changes, or is deleted, or we run out of space and flush Swift)
 * the frontline and secondary Varnishes (and will persist until the
 canonical image changes, or is deleted, or we restart the frontline
 Varnishes or we evict data from the hard disks of the secondary Varnishes)

 Is this right?

 --
 Sumana Harihareswara
 Senior Technical Writer
 Wikimedia Foundation

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

That is mostly correct afaik. The varnish set up also includes different
caches in different locations (so during invalidation failures you can have
correct data in say usa but not europe, which confuses bug reporters
considerably)

Removal of thumb from storage can also happen by doing ?action=purge on the
image description page. I believe varnish caches only store for a max of 30
days (not 100% sure on that). Swift stores forever.

Im not sure if its in scope of what your trying to document, but htcp
purging is also an important aspect of how our varnish cache works, and a
part that historically has exploded several times.

--bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] is this how our thumbnail caching works?

2014-05-13 Thread Matthew Flaschen

On 05/13/2014 06:13 PM, Sumana Harihareswara wrote:

I am trying to figure out how thumbnail retrieval  caching works right
now - with Swift, and the frontline  secondary (frontend and
backend) Varnishes. (I am working on the caching-related bit of the
performance guidelines, and want to understand and help push forward on
https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
.) I looked for docs but didn't find anything that had been updated this
year.

Here's how I think it works, assuming you are a MediaWiki developer
who's written, e.g., a page that includes a thumbnail of an image:


My understanding is that the image scaling/storage infrastructure is 
basically only used for user images (upload.wikimedia.org).  Code images 
generally go on http://bits.wikimedia.org/ and don't use Swift (though 
sometimes code may refer to a user image).


Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l