Thanks @Paul Eggert, it seems like this isn't a bug at all.
My locale (de_DE.utf8) appears to lack definitions for the mentioned
Korean characters. After setting my system language to Korean
(ko_KR.utf8) uniq produces the expected output.
For my purpose, I'll set my environment to LC_COLLATE=C,
uniq just calls strcoll, and if strcoll (A, B) returns 0 then uniq assumes the
lines are equal. So my guess is that your problem has something to do with
strcoll, not with coreutils per se.
Dear all,
I found that, when performing uniq on some Korean characters, it treats
them as equal (counts as duplicate) although the characters aren't
equal. To be precise, it happened to me on the Characters 프 (U+D504) and
틀 (U+D2C0).
An example (input, expected output, actual output) can be