Re: diacrit: mark deprecated

2019-07-17 Thread Bruno Haible
Bernhard Voelker wrote:
> In GNU coreutils, we now get this warning during bootstrap:
> 
>   Notice from module diacrit:
> This module is deprecated. Use the module 
> 'uninorm/canonical-decomposition' instead.
> 
> And indeed, the 'diacrit' module is still in use by 1 source:
> 
>   $ GIT_PAGER= git grep -En 'todiac|tobase'
>   src/ptx.c:1053:  diacritic = todiac (character);
>   src/ptx.c:1056:  base = tobase (character);
>   src/ptx.c:1338:edited_flag[character] = todiac (character) != 0;

Indeed, 'ptx' does not yet support multibyte locales.

$ echo "Böse Bübchen" | ptx -
   Böse Bübchen
   Böse   Bübchen
   Böse Bü   bchen
 Bö   se Bübchen

It looks even weirder with Greek input:

$ echo "Το τέλος του Ψυχρού Πολέμου και η διάλυση της Σοβιετικής Ένωσης άφησαν 
τις Ηνωμένες Πολιτείες για ένα διάστημα ως τη μόνη υπερδύναμη." | ptx -

(No output at all!)

Bruno


Re: diacrit: mark deprecated

2019-07-17 Thread Bernhard Voelker
[adding coreutils: discussion at
 https://lists.gnu.org/r/bug-gnulib/2019-01/msg00116.html ]

On 1/21/19 6:01 AM, Jim Meyering wrote:
> On Sun, Jan 20, 2019 at 5:11 PM Bruno Haible  wrote:
>> Hi Jim,
>>
>> You are listed as the maintainer of the 'diacrit' module. It doubt anyone is
>> still using this module, because it assumes an 8-bit character set, whereas
>> most systems have switched to UTF-8 10 to 18 years ago. Do you agree to mark
>> it deprecated?
> 
> Hi Bruno, that's fine with me.
> Thanks

In GNU coreutils, we now get this warning during bootstrap:

  Notice from module diacrit:
This module is deprecated. Use the module 'uninorm/canonical-decomposition' 
instead.

And indeed, the 'diacrit' module is still in use by 1 source:

  $ GIT_PAGER= git grep -En 'todiac|tobase'
  src/ptx.c:1053:  diacritic = todiac (character);
  src/ptx.c:1056:  base = tobase (character);
  src/ptx.c:1338:edited_flag[character] = todiac (character) != 0;

Have a nice day,
Berny