Smalyshev added a comment.
> I'm still a bit confused about this logic inside the updater, especially with this id validation checking if we have the revision already etc? Not sure what you mean "already". You can have revision ID in the change, and revision ID in Wikidata, but you still have to check against revision ID in Blazegraph, so that you do not replace newer data with older data. > hold the latest revision of an entity in some internal queue in the updater for a few second while waiting for more updates, and then just commit that to blazegraph for storage after a few seconds Not sure how holding it in the queue for a few seconds would help anything. You'd just time-shift the whole process several seconds to the past, but otherwise nothing would change. If you mean batching the updates, we already to that. But the batch for the updates covering several seconds would be huge (some bots do hundreds of updates per seconds) and putting them into SPARQL queries would make them very slow. If we split them, we slow the process down, and take the risk the whole update was useless since new data already arrived. I am not sure how waiting for a few seconds helps anything beyond what current process is already doing (and introducing additional complexity, as now we can't anymore assume we're working with latest data but always have to track which delayed update this data relates to). Maybe I misunderstand something in your proposal. > This means less reducing the php calls dramatically, increasing varnish hits, It may raise varnish hits (since everything would be varnish hit), but as for reducing PHP calls, I am not sure about that, because instead of fetching only newest edit, if the entry is edited 100 times, you now need to fetch 100 edits instead. That's 100x PHP calls. > PHP is being hit very roughly with 12.5 million requests to turn some PHP object into RDF output for special entity data, we might want to just consider caching that in its own memcached key inside wikibase so we only have to do that conversion once per revision May be worth considering, but we have tons of revisions, do we have enough memory for such cache? some entries are huge, and if one letter changes in 30M RDF, we'd be storing two 30M revisions differing in one byte. Of course, we could limit the size of the cacheable RDF - not sure how many resources are cached. TASK DETAIL https://phabricator.wikimedia.org/T217897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
