Addshore created this task.
Addshore added projects: Wikidata, wdwb-tech.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  **Introduction**
  
  During Wikibase local development we noticed something odd and slow
  
  1. Create some new wikibase items
  2. create a final wikibase item that refers to all of the other items you 
just wrote
  3. `derivedDataUpdater->prepareUpdate` and `derivedDataUpdater->doUpdates` 
are called, triggering parser output generation at the end of the same request 
edit request (which in wikibase ends up dong some amount of work)
  
  **Problem**
  
  Generation of HTML ends up being the "expensive" part of ParserOutput, as it 
needs to load data from a whole collection of other entities.
  There is secondary  storage and caching etc in place, but ideally, when not 
needed, we would not do this extra work. And if it is needed for some reason, 
we would not do it pre send in the API edit call.
  
  In many cases this HTML for the ParserOutput is not used immediately after 
the API request, as most edits on Wikidata.org are made by bots.
  Even for edits made by users, post edit they will not normally reload the 
page, as editing generally happens in JS with on page elements changing instead.
  For Wikibase 3rd party users that want to do large bulk imports of data this 
is an even more pressing issue, as they will often have less resource, speed, 
caching etc, and may not even have parser caching enabled, but post API request 
to edit a wikidata entity, they application will still go and generate this 
possible not needed html parser output.
  
  **Other details**
  
  For search indexing we already generate parser output with the `generate-html 
=> false` hint as part of T239931 <https://phabricator.wikimedia.org/T239931>.
  
  We already have a split aprser cache, but the canonical parser output is used 
for en page views, for example:
  
  - canonical `wikidatawiki:pcache:idhash:3369-0!termboxVersion=1!wb=3`
  - other `wikidatawiki:pcache:idhash:3369-0!termboxVersion=1!userlang=pt!wb=3`
  
  **Possible solution**
  
  If MediaWiki would ask the content type if html should be generated for it or 
not post edit, then we would be able to say that a ``generate-html => false` 
hint
  
  We already have a "hack" or 2 to change when parser output is or isn't 
cached, such as 
https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#180
  
  I see the generateHtml thing again, no surprise it is true by default for the 
entityvie code path 
https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#239
  4:24 PM 
  also I note 
https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#180
  4:24 PM 
  So, this could be all solved, with a similar hack, in wikibase somehow maybe?
  4:25 PM 
  yes, sorry, there are the 2 parts, i wasn't very clear in my messages above. 
1) medaiwiki asks content types somehow if they need HTML for this post edit PO 
generation stage 2) similar hack so that if no html is generated, nothing is 
saved in the pcache
  4:30 PM 
  2 seems like the easy and clear bit now, but 1 is the main part
  4:30 PM <Krinkle> Timo Tijhof | https://timotijhof.net 
  Ah I see. Skip html generating during canonical parse for an edit but don't 
save it
  
  - Predicted impact**
  
  Yeah, you wanted api edits to do less work right?
  Yes
  
  that probably has soem knock on effects

TASK DETAIL
  https://phabricator.wikimedia.org/T285987

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Tarrow, daniel, Krinkle, Addshore, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to