Darcyisverycute claimed this task.
Darcyisverycute added a comment.
{F35452779} {F35452776}
Sorry I didn't have time to write here yesterday, I worked on this as part of
the hackathon. I gave a presentation (slides and data in xlsx export attached,
it doesn't render great so I anonymously published online here
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTJKs0nFHxnvoBb6ztNZOQCntBk8KruWK5pZIf_8cxNedXexZ8Op12AOOCCQcEzSaWaZo0F4xj6u4HJ/pubhtml#>
as well). The approach I did was to circumvent that there is no fast way to
test if a given article about a wikidata item is in mainspace, I instead rely
on inclusion in a large encyclopedia ID system (I chose Encyclopedia
Britannica, info in the slides). It's fast enough to run a comparison between
two langs through the ~170k items in the particular ID system, within the 1
minute query timeout window on https://query.wikidata.org/
So to fill out the rest of the matrix I just need to work out a way to
programmatically combine the queries into a table and run on a database dump,
or run queries of the form in my presentation sequentially (possibly also with
a database dump). The full matrix is ~170 language wikis across 250+ languages,
so about 28900 queries to run in total if we wanted the full table.
@Lydia_Pintscher do you have any advice on scaling up this approach?
(NB my spreadsheet is the same as in the idea description but transposed)
TASK DETAIL
https://phabricator.wikimedia.org/T283466
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Darcyisverycute
Cc: Darcyisverycute, amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper,
Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, Alan_Ang-WMDE, karapayneWMDE,
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dinadineke, DannyS712, Nandana,
tabish.shaikh91, Lahi, Gq86, Jayprakash12345, JakeTheDeveloper, QZanden,
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Omar_sansi,
Wikidata-bugs, aude, TheDJ, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]