[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-14 Thread Lydia_Pintscher
Lydia_Pintscher closed this task as "Resolved". Lydia_Pintscher added a comment. No let's close it. Thanks a lot :) TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, Lydia_Pint

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Can we resolve this ticket or do we need anything else here? TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Silvan_WMD

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-12 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T269587#6685887 , @GoranSMilovanovic wrote: > @Lydia_Pintscher Here it goes: > > F33943238: propertyLanguages_20201211.csv Thanks! Will analyze.

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-11 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Here it goes: F33943238: propertyLanguages_20201211.csv TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpre

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-11 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Of course, it will be produced and posted here during the day. TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Silvan_

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-11 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T269587#6680787 , @GoranSMilovanovic wrote: > @Lydia_Pintscher > >> Check coverage of labels, descriptions, aliases on Properties > > Please see the `csv` file attached. Fields: > > - `prope

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-10 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. In respect to T269587#6680787 - we need to change the anchor (languages w. Wikimedia Language Code). TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimed

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher > Check coverage of labels, descriptions, aliases on Properties Please see the `csv` file attached. Fields: - `property` - `labels` - how many labels - `aliases` - in how many different languages do we find aliases for this

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Ok, the data reported in T269587#6679451 seem to be fine. The the list of all "hanging items" - items with no `P31`, `P279`, or `P361` value - relative to what was found in the `2020-

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher The data reported in T269587#6679451 will have to undergo revision, I have spotted a glitch in my filtering procedures in Pyspark. TASK DETAIL https://phabricator.wikimedia.org/T269587

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher > properties that can be used to check completeness (e.g. "number of children" + "number of participants") > find a list of such "structural" properties Well, what I did was the following: - Apache Spark to parse the most rec

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher > How many entities do we have that are not classified via `instance of P31`, `subsclass of P279`, and `part of P261`? According to the most recent version of the hdfs version of the Wikidata JSON dump (snapshot: `2020-11-23`):

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-09 Thread GoranSMilovanovic
GoranSMilovanovic updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T269587 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Silvan_WMDE, Lydia_Pintscher, GoranSMilovanovic, Aklapper, Akuckartz, Nandana,

[Wikidata-bugs] [Maniphest] T269587: Low hanging fruits for the WMDE Data Quality WD/WB Team

2020-12-07 Thread GoranSMilovanovic
GoranSMilovanovic created this task. GoranSMilovanovic added projects: User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION - Produce all "immediately" available indicators derived from the discussion in the WMDE Da