daniel added a comment.

Revisited after a first exploratory coding session showed the proposed solution 
to be problematic. An ad-hoc discussion with Thiemo and Jan resulted in going 
back to the one-aspect-per-language solution. Key points:

- The intent of tracking usage by aspect is to reduce the number of pages to 
purge when a change notification for an entity is received. Ntoe that purging a 
page purges all renderings/variants in the cache.
- adding a render_key column greatly increases the size of the table
  - the number of aspects (per item/page combination) is //multiplied// by the 
number of render keys.
  - Example: let's say 200.000 image description pages on Commons use Q183 as a 
"tag", and use the label and local page title (L and T aspects), resulting in 
400k rows in the database. If on average each page is viewed in 2 languages, 
this would result in 800k rows; not only the rows for the L usage would be 
doubled, but the rows for the T usage too, even though that kind of usage does 
not care about language.
- adding a render_key does not provide any substantial advantage over using one 
aspect
  - the expected advantage was to cover cases in which some conditional on the 
page would result in different items and aspects being used when rendering the 
page for different users.
  - however, this is only possible (and sensible) if the conditional depends on 
a feature that also causes a parser cache split.
  - Besides user language, that could be things like the page being editable, 
or the thumbnail size, numbering of headings, date format, etc.
  - Besides the user language, these settings are mostly inaccessible to 
conditionals in wikitext/Lua. And if accessible, they are very unlikely to be 
used.
  - When receiving a change notification, the associated diff is used to 
determine which aspect of the entity changed, and thus, which usages are 
affected by the change.
  - From the diff, available features for this decision are the "section" 
(terms, sitelinks, statement, etc), the language (for labels, descriptions and 
aliases), and the site id (for sitelinks).
  - Only the features available from the diff can be used to determine the 
affected aspects. So if we tracked different usages per page depending on the 
user's thumbnail size, this information would not be helpful to achieve the 
goal to limit the number of pages to purge, since the diff contains no feature 
we could filter the thumbnail size in the render_key against.

Caveat affecting both options (render_key column, or "L/de"-style aspects): 
Updating the table is difficult

- when the page is //edited//, all tracking rows referring to it (with any 
render_key / language) should be removed/invalidated.
- when a page is rendered, only rows referring to the current 
render_key/language should be added/updated/removed.
- It's unclear whether there is any guarantee over the order in which hooks 
fire when a page is edited.


TASK DETAIL
  https://phabricator.wikimedia.org/T90563

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: daniel, Aklapper, Rical, hoo, Lydia_Pintscher, Daniel_Mietchen, 
Wikidata-bugs, aude



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to