GoranSMilovanovic added a comment.
Update 20210630 - join items x scores x classes: **done** - all items with missing ORES predictions were filtered out; - all duplicated set theoretic/mereological relations were singled out (e.g. if an item refers to a class via both `P31` and `P279`, or by both `P31` and `P361`, then we count that item's contribution to the overall class quality as one contribution and not two contributions); - we are talking about 80,236,080 items assigned to 472,035 classes under analysis. Next steps: (2) classes x scores (wide data representation: this might not be necessary, depends upon the decision in (3)) → (3) decide upon a clustering procedure → (4) cluster (either Apache Spark MLlib or an in RAM R procedure from the Analytics Clients). TASK DETAIL https://phabricator.wikimedia.org/T285458 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org