GoranSMilovanovic added a comment.

  @Manuel
  
  > Maybe let's quickly talk about this in our 1:1?
  
  Of course.
  
  > What would you cluster by?
  
  Well, I guess in the beginning it would only be a matrix of (1) Wikidata 
classes x (2) the counts of ORES A, B, C, D, E scored items per class. That 
would be the most straightforward exploration of the distribution of ORES 
quality scores across the classes, and it would help us pile up at least some 
of those half million classes together in (hopefully) meaningful groups : )
  
  > What additional information could we join in? (I was thinking about some 
user and or edit data like last edited, number of unique users, number of edits 
etc that could give meaningful clusters.)
  
  All that you are saying makes sense, except for that I would not go for 
solving a more complicate problem (ORES scores + additional information on 
Wikidata classes --> clusters) before the already very complicated problem 
(ORES scores --> clusters) is solved. As I hope to be able to explain in our 
1:1 today, clustering `472,035` Wikidata classes across five simple integer 
observations (A, B, C, D, E) already presents a challenge. So my suggestion 
would be to smart small.

TASK DETAIL
  https://phabricator.wikimedia.org/T285458

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to