Lucas_Werkmeister_WMDE added a comment.

  I looked at this a bit in XHGui. I’ll explain the requests I made 
chronologically, hopefully that’ll make sense.
  
  First up is this XHGui run for a diff timeout 
<https://performance.wikimedia.org/xhgui/run/view?id=6155975f1198a64d9c8a9a59> 
(for the first URL in the task description 
<https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P570&curid=15087958&diff=358447430&oldid=358294930>).
 There are two functions taking up the majority of the wall time: 
`Wikimedia\Rdbms\DatabaseMysqli::doQuery`, and 
`Preprocessor_Hash::buildDomTreeArrayFromText`. `doQuery()` is called 5502 
times, which feels a bit excessive.
  
  F34662685: Screenshot 2021-09-30 at 15-20-34 XHGui - Profile - 
6826fe25-6fdb-4df6-ac06-cd54c59578ff w index php.png 
<https://phabricator.wikimedia.org/F34662685>
  
  I then got this XHGui run for a non-diff non-timeout 
<https://performance.wikimedia.org/xhgui/run/view?id=6155ba1e6686dd4bbcee48c1>, 
for the latest version of the same page, with `action=purge` to make sure it 
did something. The same two functions are still on top of the wall time 
“leaderboard”, but with the Wikitext→DOM parsing well ahead of the database 
query; `doQuery()` also only has 309 calls this time, an order of magnitude 
less.
  
  F34662691: Screenshot 2021-09-30 at 15-23-44 XHGui - Profile - 
313bd5cb-0c3b-45de-8df1-b07f0f666a01 wiki Wikidata Database_reports 
Con[...].png <https://phabricator.wikimedia.org/F34662691>
  
  I then realized that comparing the latest version of the page to that old 
diff link wasn’t entirely representative, since the size of the page could’ve 
changed a lot. So I took the original URL again and removed the diff parameters 
while keeping the `&oldid=`. The result is this XHGui for a non-diff timeout 
<https://performance.wikimedia.org/xhgui/run/view?id=6155bb7f02a14cb1062baf00> 
(URL 
<https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P570&curid=15087958&oldid=358294930>),
 where `doQuery()` suddenly gets called a lot less, only 253 times.
  
  F34662699: Screenshot 2021-09-30 at 15-30-09 XHGui - Profile - 
f06dafba-f0d7-4196-9556-a16513243495 w index php.png 
<https://phabricator.wikimedia.org/F34662699>
  
  Finally, I wondered if the decreasing `doQuery()` counts were just because 
some results (e.g. item labels) were loaded from the database in the first 
request but then gotten from a cache afterwards, so I loaded the original diff 
URL again. This gave me another diff timeout XHGui 
<https://performance.wikimedia.org/xhgui/run/view?id=6155bc60d7cc9b27fcc30040> 
– with, again, 5264 `doQuery()` calls taking a lot of the time.
  
  F34662720: Screenshot 2021-09-30 at 15-38-30 XHGui - Profile - 
898c7663-7125-40b7-9726-aa984451cab4 w index php.png 
<https://phabricator.wikimedia.org/F34662720>
  
  I think there are two different issues here:
  
  1. Rendering some revisions of these huge pages just takes a long time – as 
evidenced by the third request, not a diff, spending a majority of its time in 
`Preprocessor_Hash`  before eventually timing out.
  2. Rendering some diffs of these pages not only takes a long time, but also 
makes an almost exorbitant amount of database queries.

TASK DETAIL
  https://phabricator.wikimedia.org/T140879

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Marostegui, WMDE-Fisch, Epidosis, Addshore, 
jcrespo, Aklapper, abian, Zppix, Invadibot, maantietaja, Akuckartz, Iflorez, 
alaa_wmde, Nandana, Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Vali.matei, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Dinoguy1000, Lydia_Pintscher, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to