Lucas_Werkmeister_WMDE added a comment.
I looked at this a bit in XHGui. I’ll explain the requests I made chronologically, hopefully that’ll make sense. First up is this XHGui run for a diff timeout <https://performance.wikimedia.org/xhgui/run/view?id=6155975f1198a64d9c8a9a59> (for the first URL in the task description <https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P570&curid=15087958&diff=358447430&oldid=358294930>). There are two functions taking up the majority of the wall time: `Wikimedia\Rdbms\DatabaseMysqli::doQuery`, and `Preprocessor_Hash::buildDomTreeArrayFromText`. `doQuery()` is called 5502 times, which feels a bit excessive. F34662685: Screenshot 2021-09-30 at 15-20-34 XHGui - Profile - 6826fe25-6fdb-4df6-ac06-cd54c59578ff w index php.png <https://phabricator.wikimedia.org/F34662685> I then got this XHGui run for a non-diff non-timeout <https://performance.wikimedia.org/xhgui/run/view?id=6155ba1e6686dd4bbcee48c1>, for the latest version of the same page, with `action=purge` to make sure it did something. The same two functions are still on top of the wall time “leaderboard”, but with the Wikitext→DOM parsing well ahead of the database query; `doQuery()` also only has 309 calls this time, an order of magnitude less. F34662691: Screenshot 2021-09-30 at 15-23-44 XHGui - Profile - 313bd5cb-0c3b-45de-8df1-b07f0f666a01 wiki Wikidata Database_reports Con[...].png <https://phabricator.wikimedia.org/F34662691> I then realized that comparing the latest version of the page to that old diff link wasn’t entirely representative, since the size of the page could’ve changed a lot. So I took the original URL again and removed the diff parameters while keeping the `&oldid=`. The result is this XHGui for a non-diff timeout <https://performance.wikimedia.org/xhgui/run/view?id=6155bb7f02a14cb1062baf00> (URL <https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P570&curid=15087958&oldid=358294930>), where `doQuery()` suddenly gets called a lot less, only 253 times. F34662699: Screenshot 2021-09-30 at 15-30-09 XHGui - Profile - f06dafba-f0d7-4196-9556-a16513243495 w index php.png <https://phabricator.wikimedia.org/F34662699> Finally, I wondered if the decreasing `doQuery()` counts were just because some results (e.g. item labels) were loaded from the database in the first request but then gotten from a cache afterwards, so I loaded the original diff URL again. This gave me another diff timeout XHGui <https://performance.wikimedia.org/xhgui/run/view?id=6155bc60d7cc9b27fcc30040> – with, again, 5264 `doQuery()` calls taking a lot of the time. F34662720: Screenshot 2021-09-30 at 15-38-30 XHGui - Profile - 898c7663-7125-40b7-9726-aa984451cab4 w index php.png <https://phabricator.wikimedia.org/F34662720> I think there are two different issues here: 1. Rendering some revisions of these huge pages just takes a long time – as evidenced by the third request, not a diff, spending a majority of its time in `Preprocessor_Hash` before eventually timing out. 2. Rendering some diffs of these pages not only takes a long time, but also makes an almost exorbitant amount of database queries. TASK DETAIL https://phabricator.wikimedia.org/T140879 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lucas_Werkmeister_WMDE, Marostegui, WMDE-Fisch, Epidosis, Addshore, jcrespo, Aklapper, abian, Zppix, Invadibot, maantietaja, Akuckartz, Iflorez, alaa_wmde, Nandana, Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Lydia_Pintscher, Mbch331, Jay8g
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org