abian added a comment.

  Thank you both! :-)
  
  I have several concerns about how users may use and understand this 
indicator; I'll list the main ones in case you find them helpful, of course 
without any intention of hindering your work or preventing us from having 
metrics that help us better understand how useful the data could be. My main 
concerns have to do with the specification 
<https://www.wikidata.org/wiki/Wikidata:Item_quality> and with its possibly 
unrealistic ambitions.
  
  - The specification is intentionally ambiguous ("most appropriate", 
"applicable", "some important", "significant and relevant", "solid", "high 
quality", "non-trivial"...) for AI to solve these ambiguities, and, since it 
was created as part of a project with AI, leaving AI aside was definitely not 
an option. Otherwise, it would have been possible to keep the natural 
ambiguities in the short descriptions (to let users understand them easily) 
while avoiding many ambiguities in the detailed wording. The fact that the 
specification is so ambiguous and must be disambiguated by AI makes the 
ambiguity-free specification a black box from the beginning, a model that is 
not explainable in terms of how ambiguities are solved by AI, hard to test and 
prone to training problems that can be difficult to detect and fix.
  - Some ambiguities directly won't be resolved at any time, since AI cannot be 
provided with all the data it would need. Some ambiguities will inevitably be 
ignored or wrongly considered (and it won't be easy to detect which ones). In 
these terms the specification is too ambitious, it makes AI "bite more than it 
can chew" while it does not consider other aspects important to measure 
completeness.
  - Gut feeling: I have the impression that the specification is complex enough 
to cause too much cognitive load <https://en.wikipedia.org/wiki/Cognitive_load> 
on the users who make judgments to train the model. This means the judgments 
that train the model probably can't take into account all the required criteria 
at the same time.
  - The specification is not formally agreed or approved and still has the 
template `{{Draft}}`, added by the main author. If we started now to use the 
indicator derived from this specification to track the quality of Wikidata 
Items over time, we wouldn't be able to significantly improve any part of the 
specification and implementation stack, since every important change would 
prevent comparing historical data with current data.
  - If the resulting indicator, whose formula would not be explainable, were 
called "data quality" and published in a way that could be queried (T166427 
<https://phabricator.wikimedia.org/T166427>), users would indeed trust The 
Indicator as a synonym for "data quality". They would use it to sort and 
prioritize their work, perhaps ignoring the best rated Items (which would have 
relevant problems not considered by The Indicator, such as vandalism, outdated 
data, structural inconsistencies, constraint violations, etc.) and focusing on 
the worst-rated ones (even if they were really good in the criteria wrongly 
quantified, undervalued or not considered, and even if these Items had no 
impact on the project). People would probably use less of their personal 
reasoning and exploration criteria to start letting The Indicator guide their 
efforts as if it were an oracle. In my opinion, this could divert the efforts 
of some users down the wrong path and, given the deficiencies that this 
indicator would have, be more counterproductive than positive for the project.
  
  When the specification was being designed I kept in touch with its author and 
we talked about these problems in a more or less superficial way, but probably 
the constraints of the academic project didn't leave much room for action. Now 
we don't have those constraints anymore and, if we want to use the 
specification, I think we should improve it. I've checked some of the Items 
listed in the reports and I unfortunately think the resulting indicator is not 
better than Recoin <https://www.wikidata.org/wiki/Wikidata:Recoin> when it 
comes to measuring relative completeness and it's definitely below property 
constraints when it comes to measuring consistency. If the decision to start 
using this indicator without changes has already been made, I would suggest 
calling it "ORES completeness" or similar as a workaround to try to avoid some 
of the effects of possible misuse.
  
  I hope I'm not sounding like the troll that appears in my profile picture 
(it's actually an enemy from the first The Legend of Zelda) :-) and that these 
comments can help in some way.
  
  In T195702#5513571 <https://phabricator.wikimedia.org/T195702#5513571>, 
@GoranSMilovanovic wrote:
  
  > @abian In WikidataCon 2019 we will have a Data quality panel 
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2019/Program/Sessions/Data_quality_panel>,
 as well as a Data quality meetup 
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2019/Program/Sessions/Data_quality_meetup>.
 I also hope to learn more about the possible ways of Wikidata quality 
assessment there. See you in Berlin this October maybe?
  
  Hopefully we'll meet there, yes!

TASK DETAIL
  https://phabricator.wikimedia.org/T195702

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, abian
Cc: agray, abian, WMDE-leszek, darthmon_wmde, Ladsgroup, elal, Halfak, 
RazShuty, hoo, Aklapper, Esc3300, Lydia_Pintscher, DannyS712, Nandana, Lahi, 
Gq86, Xinbenlv, Vacio, GoranSMilovanovic, Fz-29, QZanden, LawExplorer, _jensen, 
rosalieper, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, 
Mbch331, Rxy
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to