Addshore created this task. Addshore added projects: Wikidata, wdwb-tech. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION **Introduction** During Wikibase local development we noticed something odd and slow 1. Create some new wikibase items 2. create a final wikibase item that refers to all of the other items you just wrote 3. `derivedDataUpdater->prepareUpdate` and `derivedDataUpdater->doUpdates` are called, triggering parser output generation at the end of the same request edit request (which in wikibase ends up dong some amount of work) **Problem** Generation of HTML ends up being the "expensive" part of ParserOutput, as it needs to load data from a whole collection of other entities. There is secondary storage and caching etc in place, but ideally, when not needed, we would not do this extra work. And if it is needed for some reason, we would not do it pre send in the API edit call. In many cases this HTML for the ParserOutput is not used immediately after the API request, as most edits on Wikidata.org are made by bots. Even for edits made by users, post edit they will not normally reload the page, as editing generally happens in JS with on page elements changing instead. For Wikibase 3rd party users that want to do large bulk imports of data this is an even more pressing issue, as they will often have less resource, speed, caching etc, and may not even have parser caching enabled, but post API request to edit a wikidata entity, they application will still go and generate this possible not needed html parser output. **Other details** For search indexing we already generate parser output with the `generate-html => false` hint as part of T239931 <https://phabricator.wikimedia.org/T239931>. We already have a split aprser cache, but the canonical parser output is used for en page views, for example: - canonical `wikidatawiki:pcache:idhash:3369-0!termboxVersion=1!wb=3` - other `wikidatawiki:pcache:idhash:3369-0!termboxVersion=1!userlang=pt!wb=3` **Possible solution** If MediaWiki would ask the content type if html should be generated for it or not post edit, then we would be able to say that a ``generate-html => false` hint We already have a "hack" or 2 to change when parser output is or isn't cached, such as https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#180 I see the generateHtml thing again, no surprise it is true by default for the entityvie code path https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#239 4:24 PM also I note https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/c566765494f366c06b56a4ae3a2257d378b93222/repo/includes/Content/EntityContent.php#180 4:24 PM So, this could be all solved, with a similar hack, in wikibase somehow maybe? 4:25 PM yes, sorry, there are the 2 parts, i wasn't very clear in my messages above. 1) medaiwiki asks content types somehow if they need HTML for this post edit PO generation stage 2) similar hack so that if no html is generated, nothing is saved in the pcache 4:30 PM 2 seems like the easy and clear bit now, but 1 is the main part 4:30 PM <Krinkle> Timo Tijhof | https://timotijhof.net Ah I see. Skip html generating during canonical parse for an edit but don't save it - Predicted impact** Yeah, you wanted api edits to do less work right? Yes that probably has soem knock on effects TASK DETAIL https://phabricator.wikimedia.org/T285987 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: Tarrow, daniel, Krinkle, Addshore, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
