Re: coreutils-9.4.170-7b206 uniq/uniq test failure

2024-03-26 Thread Bruno Haible
Pádraig Brady wrote:
> I'll apply the attached anyway which should avoid this issue.

Thanks. I confirm that it fixes the failure.






Re: coreutils-9.4.170-7b206 uniq/uniq test failure

2024-03-26 Thread Bruno Haible
Pádraig Brady wrote:
> > You should also have access to cfarm215.cfarm.net and cfarm216.cfarm.net
> > (just announced today).
> 
> Nice machines.
> Though I can't repo there, I'm guessing because they've
> both multi and uni byte fr locales installed.

I had seen the issue on Solaris 11 OpenIndiana only, not on Solaris 11.4.

Solaris 11 OpenIndiana and Solaris 11.4 are very similar, but there are
small differences.

Bruno






Re: coreutils-9.4.170-7b206 uniq/uniq test failure

2024-03-26 Thread Pádraig Brady

On 25/03/2024 22:28, Bruno Haible wrote:

Pádraig Brady wrote:

while uniq (c32isblank) now determines
it is not blank (which seems more correct).


I agree that U+00A0 NO-BREAK SPACE should better be considered to be non-blank
(and Gnulib's c32isblank does so).


The only solaris 11 system I have access to, only has the fr_FR.UTF-8 locale 
installed,
not the unibyte version


You should also have access to cfarm215.cfarm.net and cfarm216.cfarm.net
(just announced today).


Nice machines.
Though I can't repo there, I'm guessing because they've
both multi and uni byte fr locales installed.
I'll apply the attached anyway which should avoid this issue.

cheers,
Pádraig
From 6b3f75701de074da27ec7995c2d9174914f44ef4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Tue, 26 Mar 2024 15:02:17 +
Subject: [PATCH] tests: avoid false failure due to mismatched isblank()

There is a mismatch between isblank() used by tr and c32isblank() now
used by uniq. On Solaris 11 OpenIndiana isblank() was seen to return
true for non breaking space, while c32isblank() returned false. This may
have been because only the single byte fr locale was installed.
Interestingly on Solaris non breaking space is considered a blank
character, and isblank() and c32isblank() honor this where the
appropriate locales are installed.

* tests/uniq/uniq.pl: Adjust the blank check to use join(1) rather than
tr(1), as join uses the same blank determination routines as uniq(1).
---
 tests/uniq/uniq.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/uniq/uniq.pl b/tests/uniq/uniq.pl
index 34457b000..a61f26485 100755
--- a/tests/uniq/uniq.pl
+++ b/tests/uniq/uniq.pl
@@ -243,8 +243,9 @@ if ( defined $locale && $locale ne 'none' )
   {
 # I've only ever triggered the problem in a non-C locale.
 
-# See if isblank returns true for nbsp.
-my $x = qx!env printf '\xa0'| LC_ALL=$locale tr '[:blank:]' x!;
+# See if nbsp is considered a blank character
+my $x = qx!env printf 'x\xa0y'| LC_ALL=$locale join -a2 -o2.1 /dev/null -!;
+chomp $x;
 # If so, expect just one line of output in the schar test.
 # Otherwise, expect two.
 my $in = " y z\n\xa0 y z\n";
-- 
2.44.0



Re: coreutils-9.4.170-7b206 uniq/uniq test failure

2024-03-25 Thread Bruno Haible
Pádraig Brady wrote:
> while uniq (c32isblank) now determines
> it is not blank (which seems more correct).

I agree that U+00A0 NO-BREAK SPACE should better be considered to be non-blank
(and Gnulib's c32isblank does so).

> The only solaris 11 system I have access to, only has the fr_FR.UTF-8 locale 
> installed,
> not the unibyte version

You should also have access to cfarm215.cfarm.net and cfarm216.cfarm.net
(just announced today).

Bruno






Re: coreutils-9.4.170-7b206 uniq/uniq test failure

2024-03-25 Thread Pádraig Brady

On 24/03/2024 16:15, Bruno Haible wrote:

The uniq/uniq test fails on Solaris 11 OpenIndiana.

test-suite.log on Solaris 11 OpenIndiana:


FAIL: tests/uniq/uniq
=

schar...
uniq: test schar: stdout mismatch, comparing schar.2 (expected) and schar.O 
(actual)
*** schar.2 Sun Mar 24 13:24:44 2024
--- schar.O Sun Mar 24 13:24:44 2024
***
*** 1 
--- 1,2 
y z
+ � y z


I think the issue here is a mismatch between
the system's isblank() and gnulib's c32isblank().
I.e. Solaris 11 isblank() returns true for non breaking space.
The coreutils test uses tr (isblank) to determine if \xa0 is blank,
which it does do on this system, while uniq (c32isblank) now determines
it is not blank (which seems more correct).



The only solaris 11 system I have access to, only has the fr_FR.UTF-8 locale 
installed,
not the unibyte version, but interestingly on that I can see that the system 
uniq treats
non breaking space as blank. I.e. isblank() also seems to be true on Solaris
for non breaking space in UTF-8.

  $ printf " y z\n\xc2\xa0 y z\n" | LC_ALL=fr_FR.UTF-8 uniq -f1
   y z

I think I'll leave the test as is for now
to at least highlight this mismatch.
Note the test was originally checking for signed char handling
(\xa0 being greater than \x7f), which may be overkill now given
the new character handling code in place.

cheers,
Pádraig