Re: c32width gives incorrect return values in C locale

2023-11-19 Thread Dagobert Michelsen
Hi,

> Am 19.11.2023 um 00:55 schrieb Patrice Dumas :
> On Sun, Nov 19, 2023 at 12:26:02AM +0100, Patrice Dumas wrote:
>> On Sat, Nov 18, 2023 at 09:10:09PM +, Gavin Smith wrote:
>>> On Wed, Nov 15, 2023 at 09:42:21AM +0100, Patrice Dumas wrote:
 On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> On 2023-11-13 01:28, Patrice Dumas wrote:
>> According to your mail
>> https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
>> 
>> char32_t is less portable
> 
> That should be OK, if Gnulib provides a char32_t substitute that works 
> well
> enough. The mail you refer to merely says that literals like U'x' don't
> work, but this is not a show-stopper for char32_t.
 
 Indeed, the solaris10 automatic build now compiles ok with char32_t with
 Gnulib uchar after the changes Gavin made.
>>> 
>>> Is this the OpenCSW buildbot?
>> 
>> Yes.
>> 
>>> How are you checking this?
>>> 
>>> Everytime I have checked
>>> 
>>> https://buildfarm.opencsw.org/buildbot/waterfall?tag=texinfo
>>> 
>>> recently, I have a page with a bunch of error messages on it:
>> 
>> At some point it worked again (presumably on the 15 of october), and I
>> could see that tests passed.
> 
> 15 of November, not October...

The builder on Solaris 10 x86 was disconnected and Buildbot apparently also
had a runtime issue. I restarted everything accordingly and the builds
of texinfo should be clean again:
  https://buildfarm.opencsw.org/buildbot/waterfall?category=texinfo


Best regards

  — Dago

-- 
"You don't become great by trying to be great, you become great by wanting to 
do something,
and then doing it so hard that you become great in the process." - xkcd #896




Re: c32width gives incorrect return values in C locale

2023-11-18 Thread Patrice Dumas
On Sun, Nov 19, 2023 at 12:26:02AM +0100, Patrice Dumas wrote:
> On Sat, Nov 18, 2023 at 09:10:09PM +, Gavin Smith wrote:
> > On Wed, Nov 15, 2023 at 09:42:21AM +0100, Patrice Dumas wrote:
> > > On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> > > > On 2023-11-13 01:28, Patrice Dumas wrote:
> > > > > According to your mail
> > > > > https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> > > > > 
> > > > > char32_t is less portable
> > > > 
> > > > That should be OK, if Gnulib provides a char32_t substitute that works 
> > > > well
> > > > enough. The mail you refer to merely says that literals like U'x' don't
> > > > work, but this is not a show-stopper for char32_t.
> > > 
> > > Indeed, the solaris10 automatic build now compiles ok with char32_t with
> > > Gnulib uchar after the changes Gavin made.
> > 
> > Is this the OpenCSW buildbot?
> 
> Yes.
> 
> >  How are you checking this?
> > 
> > Everytime I have checked
> > 
> > https://buildfarm.opencsw.org/buildbot/waterfall?tag=texinfo
> > 
> > recently, I have a page with a bunch of error messages on it:
> 
> At some point it worked again (presumably on the 15 of october), and I
> could see that tests passed.

15 of November, not October...

-- 
Pat



Re: c32width gives incorrect return values in C locale

2023-11-18 Thread Patrice Dumas
On Sat, Nov 18, 2023 at 09:10:09PM +, Gavin Smith wrote:
> On Wed, Nov 15, 2023 at 09:42:21AM +0100, Patrice Dumas wrote:
> > On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> > > On 2023-11-13 01:28, Patrice Dumas wrote:
> > > > According to your mail
> > > > https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> > > > 
> > > > char32_t is less portable
> > > 
> > > That should be OK, if Gnulib provides a char32_t substitute that works 
> > > well
> > > enough. The mail you refer to merely says that literals like U'x' don't
> > > work, but this is not a show-stopper for char32_t.
> > 
> > Indeed, the solaris10 automatic build now compiles ok with char32_t with
> > Gnulib uchar after the changes Gavin made.
> 
> Is this the OpenCSW buildbot?

Yes.

>  How are you checking this?
> 
> Everytime I have checked
> 
> https://buildfarm.opencsw.org/buildbot/waterfall?tag=texinfo
> 
> recently, I have a page with a bunch of error messages on it:

At some point it worked again (presumably on the 15 of october), and I
could see that tests passed.

But now I get only errors again, same as you do.

-- 
Pat



Re: c32width gives incorrect return values in C locale

2023-11-18 Thread Gavin Smith
On Wed, Nov 15, 2023 at 09:42:21AM +0100, Patrice Dumas wrote:
> On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> > On 2023-11-13 01:28, Patrice Dumas wrote:
> > > According to your mail
> > > https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> > > 
> > > char32_t is less portable
> > 
> > That should be OK, if Gnulib provides a char32_t substitute that works well
> > enough. The mail you refer to merely says that literals like U'x' don't
> > work, but this is not a show-stopper for char32_t.
> 
> Indeed, the solaris10 automatic build now compiles ok with char32_t with
> Gnulib uchar after the changes Gavin made.

Is this the OpenCSW buildbot?  How are you checking this?

Everytime I have checked

https://buildfarm.opencsw.org/buildbot/waterfall?tag=texinfo

recently, I have a page with a bunch of error messages on it:


web.Server Traceback (most recent call last):
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: 
: (OperationalError) unable to open 
database file None None /opt/csw/lib/python2.7/threading.py:774:__bootstrap 
/opt/csw/lib/python2.7/threading.py:801:__bootstrap_inner 
/opt/csw/lib/python2.7/threading.py:754:run ---  --- 
/opt/csw/lib/python2.7/site-packages/twisted/python/threadpool.py:191:_worker 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:118:callWithContext
 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:81:callWithContext
 /opt/csw/lib/python2.7/site-packages/buildbot/db/pool.py:185:__thd 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/threadlocal.py:61:contextual_connect
 /opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:272:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:431:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:867:_do_get 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:225:_create_connection 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:318:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:379:__connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py:80:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/default.py:283:connect ]]
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: 
: (OperationalError) unable to open 
database file None None /opt/csw/lib/python2.7/threading.py:774:__bootstrap 
/opt/csw/lib/python2.7/threading.py:801:__bootstrap_inner 
/opt/csw/lib/python2.7/threading.py:754:run ---  --- 
/opt/csw/lib/python2.7/site-packages/twisted/python/threadpool.py:191:_worker 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:118:callWithContext
 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:81:callWithContext
 /opt/csw/lib/python2.7/site-packages/buildbot/db/pool.py:185:__thd 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/threadlocal.py:61:contextual_connect
 /opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:272:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:431:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:867:_do_get 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:225:_create_connection 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:318:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:379:__connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py:80:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/default.py:283:connect ]]







Re: c32width gives incorrect return values in C locale

2023-11-15 Thread Patrice Dumas
On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> On 2023-11-13 01:28, Patrice Dumas wrote:
> > According to your mail
> > https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> > 
> > char32_t is less portable
> 
> That should be OK, if Gnulib provides a char32_t substitute that works well
> enough. The mail you refer to merely says that literals like U'x' don't
> work, but this is not a show-stopper for char32_t.

Indeed, the solaris10 automatic build now compiles ok with char32_t with
Gnulib uchar after the changes Gavin made.

-- 
Pat



Re: c32width gives incorrect return values in C locale

2023-11-15 Thread Paul Eggert

On 2023-11-13 01:28, Patrice Dumas wrote:

According to your mail
https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html

char32_t is less portable


That should be OK, if Gnulib provides a char32_t substitute that works 
well enough. The mail you refer to merely says that literals like U'x' 
don't work, but this is not a show-stopper for char32_t.





Re: c32width gives incorrect return values in C locale

2023-11-13 Thread Gavin Smith
On Mon, Nov 13, 2023 at 10:28:35AM +0100, Patrice Dumas wrote:
> According to your mail
> https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> 
> char32_t is less portable, and indeed, the solaris automatic build sees
> to fail because of char32_t is not available, while uint32_t and ucs4_t
> are necessarily present with libunistring unless I issed something.  So
> it would probably be more portable to use uint32_t/ucs4_t, though it may
> be less practical if a conversion is needed before they can be used.

I've imported the "uchar" gnulib module so that the char32_t type will
be defined in uchar.h.  "uchar" seems significantly less bloated that
"uchar-c23" and not sure if "uchar-c23" is necessary for our purposes.



Re: c32width gives incorrect return values in C locale

2023-11-13 Thread Patrice Dumas
On Sat, Nov 11, 2023 at 11:54:52PM +0100, Bruno Haible wrote:
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

According to your mail
https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html

char32_t is less portable, and indeed, the solaris automatic build sees
to fail because of char32_t is not available, while uint32_t and ucs4_t
are necessarily present with libunistring unless I issed something.  So
it would probably be more portable to use uint32_t/ucs4_t, though it may
be less practical if a conversion is needed before they can be used.

-- 
Pat



Re: c32width gives incorrect return values in C locale

2023-11-12 Thread Gavin Smith
On Sat, Nov 11, 2023 at 11:54:52PM +0100, Bruno Haible wrote:
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > .
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).
> 
> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.
> 
> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

Thanks for the advice.



Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Eli Zaretskii
> From: Bruno Haible 
> Cc: bug-libunistr...@gnu.org
> Date: Sat, 11 Nov 2023 23:54:52 +0100
> 
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > .
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).

I think this should be explained in the documentation, if it isn't
already.  This "ambiguous width" issue is very subtle and unknown to
many (most?) people, so not having it explicit in the documentation is
not user-friendly, IMO.

> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.

Regardless of the locale?  Is there an assumption that UTF-8 means
"not CJK" or something?

> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

AFAIU, char32_t is identical to uint_least32_t (which is also from
stdint.h).



Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Bruno Haible
[CCing bug-libunistring]
Gavin Smith wrote:
> I did not understand why uc_width was said to be "locale dependent":
> 
>   "These functions are locale dependent."
> 
> - from 
> .

That's because some Unicode characters have "ambiguous width" — width 1 in
Western locales, width 2 is East Asian locales (for historic and font choice
reasons).

> I also don't understand the purpose of the "encoding" argument -- can this
> always be "UTF-8"?

Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
these characters.

> I'm also unclear on the exact relationship between the types char32_t,
> ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> I used a cast to ucs4_t when calling uc_width.

These types are all identical. Therefore you don't even need to cast.

  - char32_t comes from  (ISO C 11 or newer).
  - ucs4_t comes from GNU libunistring.
  - uint32_t comes from .

Bruno







Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Gavin Smith
On Sat, Nov 11, 2023 at 09:06:41PM +0100, Bruno Haible wrote:
> [CCing bug-gnulib]
> Indeed, the c32* functions by design work only on those Unicode characters
> that can be represented as multibyte sequences in the current locale.
> 
> I'll document this better in the Gnulib manual.
> 
> Since you want texinfo to work on UTF-8 encoded text with characters outside
> the repertoire of the current locale, you'll need the libunistring functions,
> documented in
> .
> Namely, replace c32width with uc_width.

Thanks, that seems to work perfectly.

I also changed c32isupper to uc_is_upper.  The gnulib manual stated
(node "isupper"):

  ‘c32isupper’
   This function operates in a locale dependent way, on 32-bit wide
   characters.  In order to use it, you first have to convert from
   multibyte to 32-bit wide characters, using the ‘mbrtoc32’ function.
   It is provided by the Gnulib module ‘c32isupper’.
  
  ...
  
  ‘uc_is_upper’
   This function operates in a locale independent way, on Unicode
   characters.  It is provided by the Gnulib module
   ‘unictype/ctype-upper’.

- and we wanted the "locale independent way".

I did not understand why uc_width was said to be "locale dependent":

  "These functions are locale dependent."

- from 
.

I also don't understand the purpose of the "encoding" argument -- can this
always be "UTF-8"?

I'm also unclear on the exact relationship between the types char32_t,
ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
but u8_mbtouc writes to a char32_t variable.  In the code I committed,
I used a cast to ucs4_t when calling uc_width.



Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Bruno Haible
[CCing bug-gnulib]
Gavin Smith wrote:
> > I guess you will need to look at the Unicode characters that you pass to 
> > c32width,
> > and whether you get return values < 1 for some of them.
> 
> It is locale-dependent!
> 
> It looks like c32width is simply being redirected to wcwidth which then
> doesn't work properly with LC_ALL=C.  This is from the gnulib module
> c32width.
> 
> I don't know if there is an easy way to make a self-contained example
> to show the difference, because it needs all the gnulib Makefile machinery,
> but the difference shows up for any non-ASCII character.  If I add a line
> like
> 
>  fprintf (stderr, "width of [%4.0lx] is %d (remaining %s)\n",
> (long) wc, width, q);
> 
> in the right place in the code, where width is the result of c32width,
> then the output looks like
> 
> width of [  40] is 1 (remaining @)
> width of [  4f] is 1 (remaining OE )
> width of [  45] is 1 (remaining E )
> width of [ 152] is -1 (remaining Œ)
> width of [  28] is 1 (remaining (Œ)
> 
> for LC_ALL=C, but
> 
> width of [  40] is 1 (remaining @)
> width of [  4f] is 1 (remaining OE )
> width of [  45] is 1 (remaining E )
> width of [ 152] is 1 (remaining Œ)
> width of [  28] is 1 (remaining (Œ)
> 
> otherwise (LC_ALL=en_GB.UTF-8).

Indeed, the c32* functions by design work only on those Unicode characters
that can be represented as multibyte sequences in the current locale.

I'll document this better in the Gnulib manual.

Since you want texinfo to work on UTF-8 encoded text with characters outside
the repertoire of the current locale, you'll need the libunistring functions,
documented in
.
Namely, replace c32width with uc_width.

Bruno