GoranSMilovanovic added a comment.

  Update 20210630
  
  - join items x scores x classes: **done**
    - all items with missing ORES predictions were filtered out;
    - all duplicated set theoretic/mereological relations were singled out 
(e.g. if an item refers to a class via both `P31` and `P279`, or by both `P31` 
and `P361`, then we count that item's contribution to the overall class quality 
as one contribution and not two contributions);
    - we are talking about 80,236,080 items assigned to 472,035 classes under 
analysis.
  
  Next steps:
  
  (2) classes x scores (wide data representation: this might not be necessary, 
depends upon the decision in (3)) → 
  (3) decide upon a clustering procedure → 
  (4) cluster (either Apache Spark MLlib or an in RAM R procedure from the 
Analytics Clients).

TASK DETAIL
  https://phabricator.wikimedia.org/T285458

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to