[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2022-08-15 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Thank you! :)
  
  I unfortunately don't have any good tips for scaling.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Darcyisverycute, Lydia_Pintscher
Cc: Darcyisverycute, amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper, 
Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, Alan_Ang-WMDE, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Jayprakash12345, JakeTheDeveloper, QZanden, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Omar_sansi, 
Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2022-08-15 Thread Darcyisverycute
Darcyisverycute claimed this task.
Darcyisverycute added a comment.


  {F35452779} {F35452776}
  Sorry I didn't have time to write here yesterday, I worked on this as part of 
the hackathon. I gave a presentation (slides and data in xlsx export attached, 
it doesn't render great so I anonymously published online here 

 as well). The approach I did was to circumvent that there is no fast way to 
test if a given article about a wikidata item is in mainspace, I instead rely 
on inclusion in a large encyclopedia ID system (I chose Encyclopedia 
Britannica, info in the slides). It's fast enough to run a comparison between 
two langs through the ~170k items in the particular ID system, within the 1 
minute query timeout window on https://query.wikidata.org/
  
  So to fill out the rest of the matrix I just need to work out a way to 
programmatically combine the queries into a table and run on a database dump, 
or run queries of the form in my presentation sequentially (possibly also with 
a database dump). The full matrix is ~170 language wikis across 250+ languages, 
so about 28900 queries to run in total if we wanted the full table. 
@Lydia_Pintscher do you have any advice on scaling up this approach?
  
  (NB my spreadsheet is the same as in the idea description but transposed)

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Darcyisverycute
Cc: Darcyisverycute, amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper, 
Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, Alan_Ang-WMDE, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Jayprakash12345, JakeTheDeveloper, QZanden, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Omar_sansi, 
Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2022-08-11 Thread Lydia_Pintscher
Lydia_Pintscher added a project: Wikimania-Hackathon-2022.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper, Manuel, Aklapper, 
Lydia_Pintscher, LennardHofmann, Astuthiodit_1, Alan_Ang-WMDE, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Jayprakash12345, JakeTheDeveloper, QZanden, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Samwilson, Omar_sansi, 
Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-27 Thread GoranSMilovanovic
GoranSMilovanovic removed projects: User-GoranSMilovanovic, 
WMDE-Analytics-Engineering.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper, Manuel, Aklapper, 
Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, 
Nandana, tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, 
JakeTheDeveloper, QZanden, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-25 Thread GoranSMilovanovic
GoranSMilovanovic removed GoranSMilovanovic as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: amy_rc, WMDE-leszek, GoranSMilovanovic, EpicPupper, Manuel, Aklapper, 
Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, 
Nandana, tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, 
JakeTheDeveloper, QZanden, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-25 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: WMDE-leszek.
GoranSMilovanovic added a comment.


  @Lydia_Pintscher @Manuel @WMDE-leszek
  
  Before we proceed with this, please take a look at our WDCM Sitelinks 
Dashboard :
  
  - Wiki View tab and then
- Wiki Similarity
  
  I would say that the similarity graph presented there is pretty close to what 
you are looking for.
  
  Maybe we should just think about extending the functionality of this WDCM 
system component instead of going for a new data product?

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: WMDE-leszek, GoranSMilovanovic, EpicPupper, Manuel, Aklapper, 
Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, 
Nandana, tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, 
JakeTheDeveloper, QZanden, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-24 Thread Bugreporter
Bugreporter updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Bugreporter
Cc: GoranSMilovanovic, EpicPupper, Manuel, Aklapper, Lydia_Pintscher, 
Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, JakeTheDeveloper, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-24 Thread GoranSMilovanovic
GoranSMilovanovic added projects: WMDE-Analytics-Engineering, 
User-GoranSMilovanovic.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: GoranSMilovanovic, EpicPupper, Manuel, Aklapper, Lydia_Pintscher, 
Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, JakeTheDeveloper, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-24 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: GoranSMilovanovic, EpicPupper, Manuel, Aklapper, Lydia_Pintscher, 
Invadibot, maantietaja, Akuckartz, Dinadineke, DannyS712, Nandana, 
tabish.shaikh91, Lahi, Gq86, Soteriaspace, Jayprakash12345, JakeTheDeveloper, 
QZanden, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T283466: topic overlap between Wikipedia language versions

2021-05-23 Thread Lydia_Pintscher
Lydia_Pintscher created this task.
Lydia_Pintscher added projects: Wikidata, patch-welcome.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  **Idea:**
  The different language Wikipedias cover very different topics in their 
articles. With the sitelinks on Wikidata we have data to analyze this further. 
It'd be useful to have an overview of the overlap of articles between the 
different language versions of Wikipedia. We want to make the result of this 
actionable.
  
  **This could look something like this:**
  
  |   | not covered in enwp | not covered in dewp | not covered in 
frwp |
  | enwp articles | -   | 10  | 50  
|
  | dewp articles | 42  | -   | 12  
|
  | frwp articles | 15  | 150 | -   
|
  |
  
  Each cell could then link to a list of missing topics to make it actionable. 
Preferably the list would be ordered by the number of other Wikipedias that 
cover the topic.
  
  **Notes:**
  
  - We should make it clear that there are good reasons for some topics not 
being covered in a Wikipedia and it is not always necessary to create a new 
article. These reasons can include:
- the topic is not considered notable for that Wikipedia
- the topic is covered but as a paragraph in another article for example
  - Later this could be expanded to the other Wikimedia projects.

TASK DETAIL
  https://phabricator.wikimedia.org/T283466

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, 
Dinadineke, DannyS712, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, 
Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org