https://bugzilla.wikimedia.org/show_bug.cgi?id=56798

--- Comment #3 from Nik Everett <[email protected]> ---
Something like this:

while count > 0:
  SELECT MAX(pl_page_id), COUNT(*) FROM (SELECT pl_page_id FROM page_link WHERE
pl_page_id > $last_max$ LIMIT 10000)
?


We're sure we'll get rid of the sql based counting in the normal update case
but in the population/outage recovery case (both in process and job queue
based) I was thinking of keeping it (or modifying it like you suggest.)  The
idea being that SQL based counting will be right even if Elasticsearch is super
out of date.  And it'll certainly be out of date in the population case. 
Without it we'd need a second pass at populating Elasticsearch to count the
links which just seems complicated/burdensome/nasty.

I had a look at BacklinkCache a while ago but it looked like it was pulling all
the backlinks into memory to count them.  That didn't seem pretty.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to