On Tue, May 14, 2019 at 05:55:21PM +0200, Ingo Schwarze wrote: > Hi, > > after my LC_NUMERIC cleanup for sort(1) went in (thanks to tb@ for > the review), i'd like to adress the rest of locale dependency. > > Large amounts of extremely ugly code in sort(1) - many hundreds of > lines - deal with LC_COLLATE, which we don't support now and have > no intention to support in the future. > > The code is very repetitive and currently written to handle three cases: > > 1. byte_sort == true && sort_mb_cur_max == 1 > > That is the only mode currently supported on OpenBSD. > It means everything uses the POSIX locale and ASCII. > > 2. byte_sort == false && sort_mb_cur_max == 1 > > That will never be supported on OpenBSD. > It handles 8-bit single byte character encodings which are > incompatible with UTF-8, for example ISO-LATIN-1. > > 3. byte_sort == false && sort_mb_cur_max > 1 > > Even though i doubt we will ever do it, that could theoretically > happen on OpenBSD in the remote future, if we ever choose to > implement collation support for UTF-8 locales. > > Handling case 3 would be a massive undertaking - not just a matter > of improving Unicode support, but also forcing us to maintain many > different UTF-8 locales for many different languages, which means > extremely messy stuff invading the C library. During the Belgrade > EuroBSDCon a few years ago, i talked to Baptiste Daroussin who had > just implemented LC_COLLATE in FreeBSD libc and who was utterly > scared by the complexity. Knowing ourselves, we would be scared > even more once we got there. So it will definitely not happen > quickly. Then again, ruling that out for good is maybe not a > decision to make in this particular patch. > > Consequently, the byte_sort variable can be deleted immediately, > killing case 2 for good, but i'm keeping the sort_mb_cur_max variable > as a global constant for now, even though more than half of the > code it controls is currently dead code. > > Since none of our single-byte character and string functions are > locale dependent, we can also zap LC_CTYPE while here. > > After committing this patch, i shall re-indent bwscoll() properly > in a separate commit, but i'm not including that in the patch sent > out here because it would make the patch unreadable. > > OK?
ok
