Smalyshev added a comment.
> don't do cache busting on events older than X This however gave me an idea. If we kept a map of all latest revision IDs for all items we've recently updated, we could eliminate a lot of stale updates - especially when we're catching up after the lag. The first mention of the item would fetch the latest rev, and then all the following events would basically be ignored. Right now we do something like that within the batch, and again match the revision IDs against the database after the fetches - but this way we can do it cross-batch and eliminate the unnecessary fetches. Basically that'd solve the problem of lots of fetches (while the cache is active) since each item will be fetched only once per backlog. I think with proper data structure (like SparseArray maybe?) we could keep a lot of history there relatively cheaply (we just need one 64-bit int per item). Also probably won't work for changes that lack revision ID - like deletes - but we could either ignore those (they are relatively rare) or also use timestamps (dangerous). TASK DETAIL https://phabricator.wikimedia.org/T217897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
