bug#36718: uniq treats distinct Korean characters equal

2019-07-19 Thread Felix Hamme
Thanks @Paul Eggert, it seems like this isn't a bug at all. My locale (de_DE.utf8) appears to lack definitions for the mentioned Korean characters. After setting my system language to Korean (ko_KR.utf8) uniq produces the expected output. For my purpose, I'll set my environment to LC_COLLATE=C,

bug#36718: uniq treats distinct Korean characters equal

2019-07-18 Thread Paul Eggert
uniq just calls strcoll, and if strcoll (A, B) returns 0 then uniq assumes the lines are equal. So my guess is that your problem has something to do with strcoll, not with coreutils per se.

bug#36718: uniq treats distinct Korean characters equal

2019-07-18 Thread Felix Hamme
Dear all, I found that, when performing uniq on some Korean characters, it treats them as equal (counts as duplicate) although the characters aren't equal. To be precise, it happened to me on the Characters 프 (U+D504) and 틀 (U+D2C0). An example (input, expected output, actual output) can be