Addshore added a comment.

  In T217897#5066900 <https://phabricator.wikimedia.org/T217897#5066900>, 
@Smalyshev wrote:
  
  > > WDQS does know what the latest version of the entity that it is trying to 
get updates for is,
  >
  > But "last version that WDQS knows of" can be very different from "last 
version that Wikidata has". That's the whole issue.
  >
  > I had an idea recently though. Maybe we could work it in two modes - if the 
stream is lagged sufficiently, we use the "latest available" mode - to jump to 
the front, but if we're more or less current, the probability of our change 
being current is high, so we could use "by revision ID" mode. Need to look at 
edit timings to see if it's workable but may be splitting two cases - catching 
up from large lag and keeping current - would be more efficient and allows us 
to use cache for the most frequent case (which is "keeping current"). That 
would be easy to implement and test - just a couple of if's in proper places.
  
  
  That sounds like a pretty good idea!
  How often do the updaters get lagged behind the stream?
  
  Another thing that we would also tweak would be the cache busting method. 
Right now a timestamp is used all the way down to a second.
  If the cache buster had slightly less granularity (such as only timestamps 
ending in even seconds, or 0 and 5) the probability of hits in the same few 
seconds between different updaters within the cluster would be greatly 
increased.
  But this would be a cherry on top, and if we already have the majority of 
requests using revid this probably isn't too bad.
  I guess if this would work on not again depends on what the internals of the 
updater do and if this means for sure things might be out of date in places or 
if it would be able to handle this.
  
  Another option that would be more involved would be have a single consumer of 
the stream do the hard work (generating SPARQL) and spit that back into another 
stream, so that wikibase is only hit once for each change rather than by each 
updater.
  
  In T217897#5068290 <https://phabricator.wikimedia.org/T217897#5068290>, 
@Smalyshev wrote:
  
  > @Addshore btw do I understand right that constraints can not be fetched 
per-revision? In this case, do we still need cache-busting there? Or constrains 
manage their caches? I am not sure what to do here.
  
  
  Constraints can not be fetched per revisions, you can only get the latest 
version.
  The constraint check results a single revision can change, so there is little 
point in tieing them to a revid.
  When we get finished with the work in the area the results will be 
persistently stored, so retrieving them will be cheap, they will be calculated 
after each edit, persisted and then a stream be added to saying new constraint 
check data for X now exists.

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, 
CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, 
Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, 
Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, 
Maathavan, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, 
fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to