Smalyshev added a comment.

  > don't do cache busting on events older than X
  
  This however gave me an idea. If we kept a map of all latest revision IDs for 
all items we've recently updated, we could eliminate a lot of stale updates - 
especially when we're catching up after the lag. The first mention of the item 
would fetch the latest rev, and then all the following events would basically 
be ignored.
  
  Right now we do something like that within the batch, and again match the 
revision IDs against the database after the fetches - but this way we can do it 
cross-batch and eliminate the unnecessary fetches. Basically that'd solve the 
problem of lots of fetches (while the cache is active) since each item will be 
fetched only once per backlog. I think with proper data structure (like 
SparseArray maybe?) we could keep a lot of history there relatively cheaply (we 
just need one 64-bit int per item). Also probably won't work for changes that 
lack revision ID - like deletes - but we could either ignore those (they are 
relatively rare) or also use timestamps (dangerous).

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, 
thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, 
Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to