Addshore added a comment.
In T217897#5066900 <https://phabricator.wikimedia.org/T217897#5066900>, @Smalyshev wrote: > > WDQS does know what the latest version of the entity that it is trying to get updates for is, > > But "last version that WDQS knows of" can be very different from "last version that Wikidata has". That's the whole issue. > > I had an idea recently though. Maybe we could work it in two modes - if the stream is lagged sufficiently, we use the "latest available" mode - to jump to the front, but if we're more or less current, the probability of our change being current is high, so we could use "by revision ID" mode. Need to look at edit timings to see if it's workable but may be splitting two cases - catching up from large lag and keeping current - would be more efficient and allows us to use cache for the most frequent case (which is "keeping current"). That would be easy to implement and test - just a couple of if's in proper places. That sounds like a pretty good idea! How often do the updaters get lagged behind the stream? Another thing that we would also tweak would be the cache busting method. Right now a timestamp is used all the way down to a second. If the cache buster had slightly less granularity (such as only timestamps ending in even seconds, or 0 and 5) the probability of hits in the same few seconds between different updaters within the cluster would be greatly increased. But this would be a cherry on top, and if we already have the majority of requests using revid this probably isn't too bad. I guess if this would work on not again depends on what the internals of the updater do and if this means for sure things might be out of date in places or if it would be able to handle this. Another option that would be more involved would be have a single consumer of the stream do the hard work (generating SPARQL) and spit that back into another stream, so that wikibase is only hit once for each change rather than by each updater. In T217897#5068290 <https://phabricator.wikimedia.org/T217897#5068290>, @Smalyshev wrote: > @Addshore btw do I understand right that constraints can not be fetched per-revision? In this case, do we still need cache-busting there? Or constrains manage their caches? I am not sure what to do here. Constraints can not be fetched per revisions, you can only get the latest version. The constraint check results a single revision can change, so there is little point in tieing them to a revid. When we get finished with the work in the area the results will be persistently stored, so retrieving them will be cheap, they will be calculated after each edit, persisted and then a stream be added to saying new constraint check data for X now exists. TASK DETAIL https://phabricator.wikimedia.org/T217897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
