daniel added a comment.
In https://phabricator.wikimedia.org/T103429#2127044, @ori wrote: > Doing this via the job queue is not a good solution, in my opinion. There is something fundamentally broken about tying updates to parser cache events. Why? We are indeed trying to track which parser cache entry uses what information from wikidata. No more and no less. > Subscribing to changes via a ParserCacheSaveComplete hook handler is a bit like a department store updating its inventory any time a customer touches an item: sure, it covers all the relevant transactions, but it also creates a large amount of unnecessary work, because customers may handle an item for a number of reasons (to try it on or read the label), not all of which result in a transaction. We only track if the parser actually accesses the data item. That is, when the thing in the parser cache //actually// depends on the data item. > ParserCacheSaveComplete events are the same way: they cover all the cases in which an edit is made, but they also fire in cases which do not involve a transaction. For example, the mobile and desktop web sites use different key-spaces to avoid polluting each other's parser cache entries with platform-specific artifacts, so there are at least two ParserCacheSaveComplete events for each edit. > > Hooking into ParserCacheSaveComplete is the wrong thing to do, because (AIUI) it isn't the really the parser cache that Wikibase cares about, and because it muddles the distinction between read-only and read/write requests. It's exactly the ParserCache that Wikibase cares about. We need to know which information from Wikidata has been used to construct HTML that is in the ParserCache, so we can purge the cache when the data changes. I don't see any other use case. Do you have an alternative suggestion that would achieve this? Once we have T105766: RFC: Dependency graph storage; sketch: adjacency list in DB <https://phabricator.wikimedia.org/T105766> we will no longer need this, but even then, we will need a place to store dependencies between a generated resource and whatever it was generated from. When do you think does this generated //unnecessary// work? Our tracking is fairly fine grained, provided people use the parser function and Lua module in a sane way. We'll not purge a page that uses a label of Q123 when a sitelink on Q123 changes, etc. One alternative I can think of is to store the tracking information in the parser cache itself, instead of the database. But there are two problems with that: - the tracking info must never expire before the actual cache entry - we need to be able to query usages by item. The parser cache is keyed per local page. So, until we have T105766: RFC: Dependency graph storage; sketch: adjacency list in DB <https://phabricator.wikimedia.org/T105766>, I don't see an alternative. TASK DETAIL https://phabricator.wikimedia.org/T103429 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: ori, gerritbot, hoo, Addshore, Tobi_WMDE_SW, daniel, Lydia_Pintscher, Aklapper, aaron, D3r1ck01, Izno, Wikidata-bugs, aude, GWicke, Mbch331 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
