Smalyshev added a comment.

  > I'm still a bit confused about this logic inside the updater, especially 
with this id validation checking if we have the revision already etc?
  
  Not sure what you mean "already". You can have revision ID in the change, and 
revision ID in Wikidata, but you still have to check against revision ID in 
Blazegraph, so that you do not replace newer data with older data.
  
  > hold the latest revision of an entity in some internal queue in the updater 
for a few second while waiting for more updates, and then just commit that to 
blazegraph for storage after a few seconds
  
  Not sure how holding it in the queue for a few seconds would help anything. 
You'd just time-shift the whole process several seconds to the past, but 
otherwise nothing would change. If you mean batching the updates, we already to 
that. But the batch for the updates covering several seconds would be huge 
(some bots do hundreds of updates per seconds) and putting them into SPARQL 
queries would make them very slow. If we split them, we slow the process down, 
and take the risk the whole update was useless since new data already arrived. 
I am not sure how waiting for a few seconds helps anything beyond what current 
process is already doing (and introducing additional complexity, as now we 
can't anymore assume we're working with latest data but always have to track 
which delayed update this data relates to). Maybe I misunderstand something in 
your proposal.
  
  > This means less reducing the php calls dramatically, increasing varnish 
hits,
  
  It may raise varnish hits (since everything would be varnish hit), but as for 
reducing PHP calls, I am not sure about that, because instead of fetching only 
newest edit, if the entry is edited 100 times, you now need to fetch 100 edits 
instead. That's 100x PHP calls.
  
  > PHP is being hit very roughly with 12.5 million requests to turn some PHP 
object into RDF output for special entity data, we might want to just consider 
caching that in its own memcached key inside wikibase so we only have to do 
that conversion once per revision
  
  May be worth considering, but we have tons of revisions, do we have enough 
memory for such cache? some entries are huge, and if one letter changes in 30M 
RDF, we'd be storing two 30M revisions differing in one byte. Of course, we 
could limit the size of the cacheable RDF - not sure how many resources are 
cached.

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, 
Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, 
merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, 
Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to