[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-07-19 Thread JAllemandou
JAllemandou closed this task as "Resolved".
JAllemandou added a comment.


  The analysis is documented here: 
https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Basic_Analysis.
  Thanks @AKhatun_WMF :)

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, JAllemandou
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-22 Thread AKhatun_WMF
AKhatun_WMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-22 Thread AKhatun_WMF
AKhatun_WMF moved this task from Analysis to Current work on the 
Wikidata-Query-Service board.
AKhatun_WMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-10 Thread MPhamWMF
MPhamWMF triaged this task as "High" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, 
Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-06-03 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Some of the suggested information to analyse or extract through this analysis 
are:
  
  - Top items
  - Top properties
  - Top subject, object types
  - Top property types
  - Top wikidata vs other predicates
  - Number of S, P, O that don't involve wikidata
- The aim is to find the size of the subgraph not concerning wikidata, i.e 
size of leaves. They are leaves because once they point to something outside of 
wikidata, they are not expanded within wikidata. Some things are not even 
exapandable like literals. If we have too many leaves, we may consider using 
property graphs (where leaves will be listed as properties of a node).

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, 
MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-05-06 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Maintenance_bot
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, 
maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset

2021-05-06 Thread JAllemandou
JAllemandou created this task.
JAllemandou added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a way to get familiar with the data, please provide quantitative 
information over the dataset using spark in a notebook (probably using python 
as it facilitates making charts).
  The data can be found in:
  

hdfs://analytics-hadoop/wmf/data/discovery/wikidata/rdf/date=20210419/wiki=wikidata
  
  There are multiple snapshot date available, as well as multiple wikis 
(`wikidata` and `commons`). Just pick one date with `wikidata` data :)
  In hive or spark-sql:
  
use discovery;
show partitions wikibase_rdf;

TASK DETAIL
  https://phabricator.wikimedia.org/T282139

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou
Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, MPhamWMF, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org