abian added a comment.
Thank you both! :-) I have several concerns about how users may use and understand this indicator; I'll list the main ones in case you find them helpful, of course without any intention of hindering your work or preventing us from having metrics that help us better understand how useful the data could be. My main concerns have to do with the specification <https://www.wikidata.org/wiki/Wikidata:Item_quality> and with its possibly unrealistic ambitions. - The specification is intentionally ambiguous ("most appropriate", "applicable", "some important", "significant and relevant", "solid", "high quality", "non-trivial"...) for AI to solve these ambiguities, and, since it was created as part of a project with AI, leaving AI aside was definitely not an option. Otherwise, it would have been possible to keep the natural ambiguities in the short descriptions (to let users understand them easily) while avoiding many ambiguities in the detailed wording. The fact that the specification is so ambiguous and must be disambiguated by AI makes the ambiguity-free specification a black box from the beginning, a model that is not explainable in terms of how ambiguities are solved by AI, hard to test and prone to training problems that can be difficult to detect and fix. - Some ambiguities directly won't be resolved at any time, since AI cannot be provided with all the data it would need. Some ambiguities will inevitably be ignored or wrongly considered (and it won't be easy to detect which ones). In these terms the specification is too ambitious, it makes AI "bite more than it can chew" while it does not consider other aspects important to measure completeness. - Gut feeling: I have the impression that the specification is complex enough to cause too much cognitive load <https://en.wikipedia.org/wiki/Cognitive_load> on the users who make judgments to train the model. This means the judgments that train the model probably can't take into account all the required criteria at the same time. - The specification is not formally agreed or approved and still has the template `{{Draft}}`, added by the main author. If we started now to use the indicator derived from this specification to track the quality of Wikidata Items over time, we wouldn't be able to significantly improve any part of the specification and implementation stack, since every important change would prevent comparing historical data with current data. - If the resulting indicator, whose formula would not be explainable, were called "data quality" and published in a way that could be queried (T166427 <https://phabricator.wikimedia.org/T166427>), users would indeed trust The Indicator as a synonym for "data quality". They would use it to sort and prioritize their work, perhaps ignoring the best rated Items (which would have relevant problems not considered by The Indicator, such as vandalism, outdated data, structural inconsistencies, constraint violations, etc.) and focusing on the worst-rated ones (even if they were really good in the criteria wrongly quantified, undervalued or not considered, and even if these Items had no impact on the project). People would probably use less of their personal reasoning and exploration criteria to start letting The Indicator guide their efforts as if it were an oracle. In my opinion, this could divert the efforts of some users down the wrong path and, given the deficiencies that this indicator would have, be more counterproductive than positive for the project. When the specification was being designed I kept in touch with its author and we talked about these problems in a more or less superficial way, but probably the constraints of the academic project didn't leave much room for action. Now we don't have those constraints anymore and, if we want to use the specification, I think we should improve it. I've checked some of the Items listed in the reports and I unfortunately think the resulting indicator is not better than Recoin <https://www.wikidata.org/wiki/Wikidata:Recoin> when it comes to measuring relative completeness and it's definitely below property constraints when it comes to measuring consistency. If the decision to start using this indicator without changes has already been made, I would suggest calling it "ORES completeness" or similar as a workaround to try to avoid some of the effects of possible misuse. I hope I'm not sounding like the troll that appears in my profile picture (it's actually an enemy from the first The Legend of Zelda) :-) and that these comments can help in some way. In T195702#5513571 <https://phabricator.wikimedia.org/T195702#5513571>, @GoranSMilovanovic wrote: > @abian In WikidataCon 2019 we will have a Data quality panel <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2019/Program/Sessions/Data_quality_panel>, as well as a Data quality meetup <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2019/Program/Sessions/Data_quality_meetup>. I also hope to learn more about the possible ways of Wikidata quality assessment there. See you in Berlin this October maybe? Hopefully we'll meet there, yes! TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, abian Cc: agray, abian, WMDE-leszek, darthmon_wmde, Ladsgroup, elal, Halfak, RazShuty, hoo, Aklapper, Esc3300, Lydia_Pintscher, DannyS712, Nandana, Lahi, Gq86, Xinbenlv, Vacio, GoranSMilovanovic, Fz-29, QZanden, LawExplorer, _jensen, rosalieper, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, Mbch331, Rxy
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
