GoranSMilovanovic added a comment.
Update 20210630
- join items x scores x classes: **done**
- all items with missing ORES predictions were filtered out;
- all duplicated set theoretic/mereological relations were singled out
(e.g. if an item refers to a class via both `P31` and `P279`, or by both `P31`
and `P361`, then we count that item's contribution to the overall class quality
as one contribution and not two contributions);
- we are talking about 80,236,080 items assigned to 472,035 classes under
analysis.
Next steps:
(2) classes x scores (wide data representation: this might not be necessary,
depends upon the decision in (3)) →
(3) decide upon a clustering procedure →
(4) cluster (either Apache Spark MLlib or an in RAM R procedure from the
Analytics Clients).
TASK DETAIL
https://phabricator.wikimedia.org/T285458
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic,
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]