[Wikidata-bugs] [Maniphest] [Updated] T103429: Investigation: Parser save hook handler does master writes in GETs

daniel Sat, 19 Mar 2016 19:40:18 -0700

daniel added a comment.

  In https://phabricator.wikimedia.org/T103429#2127044, @ori wrote:

  > Doing this via the job queue is not a good solution, in my opinion. There 
is something fundamentally broken about tying updates to parser cache events.

  Why? We are indeed trying to track which parser cache entry uses what 
information from wikidata. No more and no less.

  > Subscribing to changes via a ParserCacheSaveComplete hook handler is a bit 
like a department store updating its inventory any time a customer touches an 
item: sure, it covers all the relevant transactions, but it also creates a 
large amount of unnecessary work, because customers may handle an item for a 
number of reasons (to try it on or read the label), not all of which result in 
a transaction.

  We only track if the parser actually accesses the data item. That is, when 
the thing in the parser cache //actually// depends on the data item.

  > ParserCacheSaveComplete events are the same way: they cover all the cases 
in which an edit is made, but they also fire in cases which do not involve a 
transaction. For example, the mobile and desktop web sites use different 
key-spaces to avoid polluting each other's parser cache entries with 
platform-specific artifacts, so there are at least two ParserCacheSaveComplete 
events for each edit.
  > 
  > Hooking into ParserCacheSaveComplete is the wrong thing to do, because 
(AIUI) it isn't the really the parser cache that Wikibase cares about, and 
because it muddles the distinction between read-only and read/write requests.

  It's exactly the ParserCache that Wikibase cares about. We need to know which 
information from Wikidata has been used to construct HTML that is in the 
ParserCache, so we can purge the cache when the data changes. I don't see any 
other use case. Do you have an alternative suggestion that would achieve this? 
Once we have T105766: RFC: Dependency graph storage; sketch: adjacency list in 
DB <https://phabricator.wikimedia.org/T105766> we will no longer need this, but 
even then, we will need a place to store dependencies between a generated 
resource and whatever it was generated from.

  When do you think does this generated //unnecessary// work? Our tracking is 
fairly fine grained, provided people use the parser function and Lua module in 
a sane way. We'll not purge a page that uses a label of Q123 when a sitelink on 
Q123 changes, etc.

  One alternative I can think of is to store the tracking information in the 
parser cache itself, instead of the database. But there are two problems with 
that:

  - the tracking info must never expire before the actual cache entry
  - we need to be able to query usages by item. The parser cache is keyed per 
local page.

  So, until we have T105766: RFC: Dependency graph storage; sketch: adjacency 
list in DB <https://phabricator.wikimedia.org/T105766>, I don't see an 
alternative.

TASK DETAIL
  https://phabricator.wikimedia.org/T103429

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: ori, gerritbot, hoo, Addshore, Tobi_WMDE_SW, daniel, Lydia_Pintscher, 
Aklapper, aaron, D3r1ck01, Izno, Wikidata-bugs, aude, GWicke, Mbch331

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Updated] T103429: Investigation: Parser save hook handler does master writes in GETs

Reply via email to