|Lucas_Werkmeister_WMDE added a comment.|
Before we can wipe the column, we need to decide what to do with this – we can either add another LabelConflictFinder implementation based on Cirrus/Elastic, or use term_text instead of term_search_key in conflict detection (and accept that “foo” and “foO” will no longer be detected as a conflict).
Some more information – using term_text doesn’t only mean case sensitivity in the search, it also means that «thé» and «thé» will be considered different (one uses a precomposed character, one a combining diacritic), or “foo” and “foo” (the second one starts with an LTR mark), or “foo” and “foo ”. The term_search_key normalization takes care of all that for us (Unicode normalization, removal of control characters, stripping of leading and trailing whitespace, and finally case conversion).
@Lydia_Pintscher any opinion on this? Do you think it’s acceptable to detect conflicts based only on the pure, non-normalized term text?
Cc: Lydia_Pintscher, Aklapper, Lucas_Werkmeister_WMDE, Jonas, Ladsgroup, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidataemail@example.com https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs