[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2023-11-21 Thread Aklapper
Aklapper removed GoranSMilovanovic as the assignee of this task.
Aklapper added a comment.


  @GoranSMilovanovic: Per emails from Sep18 and Oct20 and 
https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup , I am resetting 
the assignee of this task because there has not been progress lately (please 
correct me if I am wrong!). Resetting the assignee avoids the impression that 
somebody is already working on this task. It also allows others to potentially 
work towards fixing this task. Please claim this task again when you plan to 
work on it (via Add Action... > Assign / Claim in the dropdown menu) - it would 
be welcome. Thanks for your understanding!

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Aklapper
Cc: Mike_Peel, Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2022-09-07 Thread Manuel
Manuel updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Manuel
Cc: Mike_Peel, Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-10-24 Thread GoranSMilovanovic
GoranSMilovanovic added a parent task: T285739: [Epic] Wikidata content quality 
analytics.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Mike_Peel, Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, 
maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-09-08 Thread Mike_Peel
Mike_Peel added a comment.


  For Commons, don't forget that the sitelink may be in a different item 
(category vs. topic or list item), which probably complicates the queries quite 
a bit.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Mike_Peel
Cc: Mike_Peel, Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, 
maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-26 Thread GoranSMilovanovic
GoranSMilovanovic added a subtask: T289810: Number and proportion of bot edits 
per projects.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-20 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  Here is a refinment of T288611#7293369 
:
  
  **Sitelinks Statistics**
  
  1. In **whole Wikidata**, we currently find `26,368,626` items (out of 
`91,437,737` items with `P31 instance of`, `P279 subclass of`, or `P361 part 
of`) with sitelinks: that would be about 28.84% of all Wikidata items, implying 
**71.16%** of items w/o sitelinks.
  
  2. In **Astronomical Objects** alone, we currently find `354,814` items (out 
of `8,417,204`) with sitelinks: that would be about 4.22% of all Astronomical 
Objects in Wikidata items, implying **95.78%** of items w/o sitelinks.
  
  3. In **Scholarly Papers** alone, we currently find `20,700` items (out of 
`37,380,570`) with sitelinks: that means that close to 0% of items in Scholarly 
Papers have sitelinks.
  
  4. In **"core" Wikidata** (i.e. Wikidata - (Astronomical Objects + Scholarly 
Papers)), we currently find `25,993,112` items (out of `45,639,964` items with 
`P31 instance of`, `P279 subclass of`, or `P361 part of`) with sitelinks: that 
means that 57% of items in "core" Wikidata have sitelinks while **43% do not**.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  The datasets described in T288611#7283258 
 are now updated with 
correct data and found in this public directory 
.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  1. In **whole Wikidata**, we currently find `78,505,497` (out of 
`94,158,141`) items with at least one External Id: that would be about 83% of 
all Wikidata items, implying **17%** of items w/o External Ids**.
  
  2. In **Astronomical Objects** alone, we currently find `11,757,567` (out of 
`11,759,422`) items with at least one External Id: that would be about 99.98% 
of all Astronomical Objects items, implying almost no items w/o External Ids**.
  
  3. In **Scholarly Papers** alone, we currently find ` 39,556,773` (out of 
`39,672,491 `) items with at least one External Id: that would be about 99.71% 
of all Scholarly Papers items, implying almost no items w/o External Ids**.
  
  4. In **"core" Wikidata** (i.e. Wikidata - (Astronomical Objects + Scholarly 
Papers)), we currently find `32,824,937` (out of `48,360,368`) items with at 
least one External Id: that would be about 67.88% of all Wikidata items, 
implying around **32.12%** of items w/o External Ids**.
  
  **Now** I really need to focus on a partial re-do of the Sitelinks datasets 
in accordance with T288611#7296573 
.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  **IMPORTANT.** Probably all numbers - except those reported for whole 
Wikidata - will have to be corrected here. 
  I have been using WDQS to obtain the instances of  all sub-classes of 
Astronomical Objects and Scholarly Articles until now.
  How naive of me. I have just realized - while I should have been well aware 
of the fact - that some of my queries in Scholarly Articles timeout. 
  The consequence is that I have only partial lists of items that are instances 
of Scholarly Articles. Most probably, nothing similar has happened in 
Astronomical Objects.
  
  Re-run everything on the dump, Pyspark ETL. Reporting back ASAP.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel From our 1:1
  
  > Number and % of items in WD with (no) external identifier [split by core, 
astronomical, citation]
  
  
  
  - ETL phase completed, datasets obtained;
  - re-composition in R, in RAM analysis now.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  Here are a few more things, general statistics on whole Wikidata, to consider:
  
  - we consider `590,404` classes in total;
  - `307,646` classes (52%) do not have a single item with a sitelink;
  - here are (a) a chart with the top 50 classes with a large number of items 
missing sitelinks, and
  - (b) a table (`csv`, `zip` compression) with all of the classes listed, 
sorted by the number of items w/o sitelinks; English labels are provided for 
the top 1,000 classes only.
  
  F34604236: Wikidata_NO_SITELINKS.png 

  
  F34604241: noSLClasses.zip 

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  > Do we know why there are so many astronomical objects with sitelinks? (e.g. 
what projects do they predominantly connect to?)
  
  The following table should be able to help answer your question.
  
  F34601012: astrFrame.csv 

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  From our 1:1 TUE 17. August 2021:
  
  > Number and % of items in WD with (no) sitelinks [split by core, 
astronomical, citation]
  
  **"Core" Wikidata (i.e. Wikidata - (Astronomical Objects + Scholarly 
Articles))**
  
  - number of items w. sitelinks: `27907021`, percent of items w. sitelinks: 
`31.35%`
  
  **Astronomical Objects only**
  
  - number of items w. sitelinks: `480508`, percent of items w. sitelinks: 
`3.99%`
  
  **Scholarly Articles only**
  
  - number of items w. sitelinks: `22063`, percent of items w. sitelinks: 
`0.47%`
  
  **N.B.** Take into your consideration that this data are //approximate// 
because an item can be an instanceOf/subclassOf/partOf different classes, and 
our source dataset here is organize class-wise, not item-wise. However, I doubt 
that the result would change in a significant way if we would go for a whole 
new item-wise ETL here.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  - The data are published here 

 (tar.gz -> .csv files) - better than in Google Drive;
  
  - **Filenames**
- `contingency_WD_FULL.tar.gz` - everything, whole WIkidata
- `contingency_WD_CORE.tar.gz` - Wikidata - (Astronomical Objetcs + 
Scholarly Articles)
- `contingency_WD_CITATIONS.tar.gz` - Scholarly Articles only
- `contingency_WD_ASTRONOMY.tar.gz` - Astronomical Objects only
  
  - **Columns**
- A set of columns indicating a particular WMF projects; please **note** 
there is one column called `NO_SITELINK` among them;
- `class` - the Wikidata class in the respective row
- `num_items` - how many items are found in this Wikidata class
- `num_items_w_sitelinks` - how many items w. sitelinks are found in this 
Wikidata class
- `total_sitelinks` - how many sitelinks in total exist for the items in 
this class

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Manuel
  
  The general case (whole Wikidata) is solved, result:
  
  - a table
  - rows: Wikidata classes
  - columns: Wikimedia projects
  - cells: number of items in a particular class w. sitelinks towards a 
particular project
  - additional columns:
- number of items in the class
- number of items w. sitelinks in the class
- total number of sitelinks in the class
  
  The dataset is huge and will be shared via Google Drive.
  
  I will deliver tomorrow (Friday), as agreed:
  
  - reduced dataset 1: Wikidata - (Scholarly Articles + Astronomical Objects)
  - reduced dataset2:  Scholarly Articles
  - reduced dataset3: Astronomical Objects

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-11 Thread GoranSMilovanovic
GoranSMilovanovic updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-11 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T288611

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Maintenance_bot
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org