Lucas_Werkmeister_WMDE added a comment.

Before we can wipe the column, we need to decide what to do with this – we can either add another LabelConflictFinder implementation based on Cirrus/Elastic, or use term_text instead of term_search_key in conflict detection (and accept that “foo” and “foO” will no longer be detected as a conflict).

Some more information – using term_text doesn’t only mean case sensitivity in the search, it also means that «thé» and «thé» will be considered different (one uses a precomposed character, one a combining diacritic), or “foo” and “‎foo” (the second one starts with an LTR mark), or “foo” and “foo ”. The term_search_key normalization takes care of all that for us (Unicode normalization, removal of control characters, stripping of leading and trailing whitespace, and finally case conversion).

@Lydia_Pintscher any opinion on this? Do you think it’s acceptable to detect conflicts based only on the pure, non-normalized term text?



To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, Aklapper, Lucas_Werkmeister_WMDE, Jonas, Ladsgroup, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list

Reply via email to