[Bug 63248] search pageimages feature causes significant extra infrastructure load

bugzilla-daemon Sat, 29 Mar 2014 12:39:07 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=63248


--- Comment #7 from Max Semenik <[email protected]> ---
(In reply to Faidon Liambotis from comment #5)
> I don't think the cacheable bit will help much. The pageimages API parameter
> will be for a more-or-less unique combination of articles OR'ed with each
> other, as they resulted from the search query. I'm not sure of the
> distribution of our search queries, but I'm guessing caching those
> combinations won't make a huge difference

The frequency dump says otherwise:
* The top line would have been cached (but we're killing it with 121930
anyway).
* From line 5 onwards there are a bunch of popular requests with titles from
prefix search for "r", "ri", "m", "k", "john" etc - these are not random
fluctuations and will always be present with some noticeable frequency so
caching them will be highly beneficial.

> (and it will be a waste of cache memory/disk).

API results with cache mode = 'public' have a lifetime of 12 hours, this should
be enough to prevent infinite hoarding of cache objects. Anyway, most query
modules' results are public so this will not change the situation much.

> It won't help client performance either, as clients will still do 17 extra
> requests on every search for this feature. I think the design of this whole
> feature honestly sounds a bit naive to me. We should really go a step
> backwards and rethink the best way to do this with server- and client-side
> performance in mind.

> The dozen upload.wikimedia.org requests to fetch a 1-2KB thumb eachs is
> still going to kill client-side performance though. There's a tremendous
> overhead of data for the request and round-trips that will delay the total
> page load time from a viewer's perspective. It's also high-latency requests,
> as these 50px images in most cases will need to be scaled on the fly.
> Unfortunately our architecture makes it difficult to return this directly in
> the result set as e.g. data URIs but if that'd work maybe we could find a
> way.

The current implementation tries to work around this by waiting for 500ms after
search results are outputted before displaying the images. The problem with it
is that doesn't retrieve the page images information in the same request as
search, resulting in extra API requests.

(In reply to Jon from comment #6)
> Ideally the search results API would return these results. This would remove
> the need for the or query and the additional get itself. Could the results
> of the search api query be piped into page images?

Currently it's not possible as prefix search is available only with the
non-query opensearch module. It should be trivial to add a similar generator
module, however - I wonder why it hasn't been done yet:)

> As an interim would it make any difference if we just fetched the top 5
> results and threw in some client side caching so that additional images are
> not retrieved?

How reliable would it be to detect which list items are actually visible on
screen? This would increase cache fragmentation though because currently you're
always requesting as many results as you have.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 63248] search pageimages feature causes significant extra infrastructure load

Reply via email to