[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-08-01 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|14.2|14.3

--- Comment #12 from Jakub Jelinek  ---
GCC 14.2 is being released, retargeting bugs to GCC 14.3.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-05-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|14.0|14.2

--- Comment #11 from Richard Biener  ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #10 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #9 from Jakub Jelinek  ---
> (In reply to r...@cebitec.uni-bielefeld.de from comment #8)
>> FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
>> from the OpenSolaris sources, modified not to do that '?' conversion.
>> Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
>> more prevalent and would need to be eradicated upstream.
>
> If it is always '?' used instead of unknown character, we could also have some
> hack on the libcpp side for it.

It took me a bit to get back to you here since I had to check with both
Solaris engineering and dig up our old Solaris 9 sources (which, unlike,
OpenSolaris, have no relevant parts missing due to copyright issues).

Both what I found in the iconv conversion tables and what's documented
in unicode_iconv(7) confirms the consistent use of '?'.  The manpage has

   If the source character code value is not within a range defined by the
   source  codeset  standard, it is considered as an illegal character. If
   the source character code value is within the range(s) defined  by  the
   standard,  it  will  be considered as non-identical, even if the source
   character code value maps to an undefined or a reserved location within
   the valid range. The non-identical character will map to either ? (0x3f
   in ASCII-compatible codesets) if the target codeset  is  a  non-Unicode
   codeset  or  to  Unicode  replacement  character (U+FFFD) if the target
   codeset is an Unicode codeset.

It will of course be in the respective charset's encoding (0x3f for
ASCII, 0x6f for EBCDIC), but that's all I could find.  This is not a
complete guarantee (I may well have missed something), but seems
plausible enough...

> Like (but limited to Solaris hosts) in convert_using_iconv when converting 
> from
> SOURCE_CHARSET to some other character set don't try to convert the whole 
> UTF-8
> string at once, but split it into chunks at u'?' characters, so
> foo???bar?baz?qux
> would be iconv converted as
> foo
> ???
> bar
> ?
> baz
> ?
> qux
> chunks.  And when converting the non-? chunks, it would after the conversion
> check for the '?' character (in the destination character set - that is
> something that perhaps could be queried during initialization after 
> iconv_open)
> and treat it as an error if it appeared there.  Or always convert also back to
> UTF-8 and check if it has more '?' characters than the source.

Unless we want to take the easy way out and just require GNU libiconv on
Solaris, that seems like a plausible way of handling the issue.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #9 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #8)
> FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
> from the OpenSolaris sources, modified not to do that '?' conversion.
> Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
> more prevalent and would need to be eradicated upstream.

If it is always '?' used instead of unknown character, we could also have some
hack on the libcpp side for it.
Like (but limited to Solaris hosts) in convert_using_iconv when converting from
SOURCE_CHARSET to some other character set don't try to convert the whole UTF-8
string at once, but split it into chunks at u'?' characters, so
foo???bar?baz?qux
would be iconv converted as
foo
???
bar
?
baz
?
qux
chunks.  And when converting the non-? chunks, it would after the conversion
check for the '?' character (in the destination character set - that is
something that perhaps could be queried during initialization after iconv_open)
and treat it as an error if it appeared there.  Or always convert also back to
UTF-8 and check if it has more '?' characters than the source.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #8 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #7 from Jakub Jelinek  ---
> (In reply to r...@cebitec.uni-bielefeld.de from comment #6)
>> > --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE > > Uni-Bielefeld.DE> ---
>> >> --- Comment #4 from Jakub Jelinek  ---
>> >> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
>> >> that program is ill-formed if some character lacks encoding in the 
>> >> execution
>> >> character set, I'm afraid the Solaris iconv behavior results in violation 
>> >> of
>> 
>> Although I can barely wrap my head around the standardese there, I had a
>> look at n4928 (the last? C++23 draft), which has a different wording
>> here (p.25, 5.13.3):
>
> The testcase is for a C++26 feature, which made those ill-formed.

Should have been obvious from the pathname ;-(  N4971 has that wording...

>> The current Solaris iconv behaviour certainly isn't particularly
>> intuitive and I'll ask the Solaris engineers about it.  However, there's
>> the question what to do about the testcase?  Just xfail it on Solaris or
>> omit just the two affected subtests there?
>
> xfailing is one possibility, but then on Solaris we'll never support C++26
> properly.

I guess it's the best solution in the short term (GCC 14), though.

> Or require using GNU libiconv rather than Solaris iconv if it can't deal with
> that?

At least document the suggestion in install.texi; I wouldn't make it a
hard requirement yet.  I'll also wait what the Solaris engineers can
provide on background for the current behaviour.

FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
from the OpenSolaris sources, modified not to do that '?' conversion.
Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
more prevalent and would need to be eradicated upstream.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #7 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #6)
> > --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE  > Uni-Bielefeld.DE> ---
> >> --- Comment #4 from Jakub Jelinek  ---
> >> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
> >> that program is ill-formed if some character lacks encoding in the 
> >> execution
> >> character set, I'm afraid the Solaris iconv behavior results in violation 
> >> of
> 
> Although I can barely wrap my head around the standardese there, I had a
> look at n4928 (the last? C++23 draft), which has a different wording
> here (p.25, 5.13.3):

The testcase is for a C++26 feature, which made those ill-formed.

> The current Solaris iconv behaviour certainly isn't particularly
> intuitive and I'll ask the Solaris engineers about it.  However, there's
> the question what to do about the testcase?  Just xfail it on Solaris or
> omit just the two affected subtests there?

xfailing is one possibility, but then on Solaris we'll never support C++26
properly.
Or require using GNU libiconv rather than Solaris iconv if it can't deal with
that?

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #6 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE  Uni-Bielefeld.DE> ---
>> --- Comment #4 from Jakub Jelinek  ---
>> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
>> that program is ill-formed if some character lacks encoding in the execution
>> character set, I'm afraid the Solaris iconv behavior results in violation of

Although I can barely wrap my head around the standardese there, I had a
look at n4928 (the last? C++23 draft), which has a different wording
here (p.25, 5.13.3):

(3.1) — A character-literal with a c-char-sequence consisting of a
 single basic-c-char, simple-escape-sequence, or
 universal-character-name is the code unit value of the
 specified character as encoded in the literal’s associated
 character encoding.

 [Note 2 : If the specified character lacks representation in
 the literal’s associated character encoding or if it cannot be
 encoded as a single code unit, then the literal is a
 non-encodable character literal. —end note

> I've not yet tried to understand what either iconv(3) has to say on the
> matter.

Digging further, Solaris iconv(3C) has

   If  iconv()  encounters  a character in the input buffer that is legal,
   but for which an identical character does not exist in the target  code
   set,  iconv()  performs  an  implementation-defined  conversion on this
   character.

which exactly matches XPG7, so the behaviour seems to be in line with
the standards.

I've also found that Solaris 11 has iconvctl(3C) (obviously patterened
after GNU libiconv) with

   ICONV_SET_TRANSLITERATE

   With  this  request  and  a  pointer to a const int with a non-zero
   value, caller can instruct the current conversion to  transliterate
   non-identical characters from the input buffer during the code con-
   version  as  much  as it can. The value of zero, on the other hand,
   turns it off.

However,

int transliterate = 0;
iconvctl (cd, ICONV_SET_TRANSLITERATE, &transliterate);

doesn't make a difference.

The current Solaris iconv behaviour certainly isn't particularly
intuitive and I'll ask the Solaris engineers about it.  However, there's
the question what to do about the testcase?  Just xfail it on Solaris or
omit just the two affected subtests there?

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2024-03-12 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #4 from Jakub Jelinek  ---
> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
> that program is ill-formed if some character lacks encoding in the execution
> character set, I'm afraid the Solaris iconv behavior results in violation of
> the C++ standard requirements, it is hard to argue that in the Solaris case
> e.g. ISO-8859-1 execution charset would be some special character set where ?
> character represents all Unicode characters which don't have a representation
> in the character set in addition to ?.

I've now started digging into this myself.

* Solaris iconv(1) says

   output. If no conversion exists for a particular character,  an  imple-
   mentation-defined conversion is performed on this character.

* This seems to at least partially match with XPG7:

-s  Suppress any messages written to standard error concerning invalid
characters. When -s is not used, the results of encountering invalid
characters in the input stream (either those that are not valid
characters in the codeset of the input file or that have no
corresponding character in the codeset of the output file) shall be
specified in the system documentation. The presence or absence of -s
shall not affect the exit status of iconv.

  AFAIU that's related to what Solaris iconv(1) does, although they
  don't specify the output '?' and produce no message.  However, they
  still exit with 0, which seems wrong to me.

I've not yet tried to understand what either iconv(3) has to say on the
matter.

> I'm afraid we don't want to maintain iconv replacement inside of libcpp 
> though.

Agreed.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
that program is ill-formed if some character lacks encoding in the execution
character set, I'm afraid the Solaris iconv behavior results in violation of
the C++ standard requirements, it is hard to argue that in the Solaris case
e.g. ISO-8859-1 execution charset would be some special character set where ?
character represents all Unicode characters which don't have a representation
in the character set in addition to ?.
I'm afraid we don't want to maintain iconv replacement inside of libcpp though.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #3 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #2)
> > --- Comment #1 from Jakub Jelinek  ---
> > Strange.  On cfarm211 which is
> > SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> > the test passes.
> 
> Can you check which libiconv got picked up there?  The non-standard
> OpenCSW packages on that system may include GNU libiconv and install
> into default system directories, so they are picked up by default.

/opt/csw/lib/libiconv.so.2
> 
> > You get no diagnostics for those lines at all?  Buggy libconv?
> 
> No.  There's no separate libiconv on Solaris; the iconv* functions are
> included in libc.

On Linux I get:
echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
iconv: illegal input sequence at position 0
iconv: illegal input sequence at position 0
while on Solaris
echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
?
?
If it maps all characters which do not have representation in the destination
character set into ?, then it is useless for the test in question.

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Jakub Jelinek  ---
> Strange.  On cfarm211 which is
> SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> the test passes.

Can you check which libiconv got picked up there?  The non-standard
OpenCSW packages on that system may include GNU libiconv and install
into default system directories, so they are picked up by default.

> You get no diagnostics for those lines at all?  Buggy libconv?

No.  There's no separate libiconv on Solaris; the iconv* functions are
included in libc.

> I mean the emojis certainly aren't in ISO-8859-1...

Probably not ;-)

FWIW, I've just built trunk with GNU libiconv 1.17 on
i386-pc-solaris2.11.  The test PASSes now with both LANG=C and
LANG=en_US.UTF-8.

I'll dig further into Solaris iconv functions here...

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-21 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #1 from Jakub Jelinek  ---
Strange.  On cfarm211 which is
SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
the test passes.
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:7:9: warning:
multi-character character constant [-Wmultichar]
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:8:9: warning:
multi-character character constant [-Wmultichar]
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:10:9: error:
converting to execution character set: Illegal byte sequence
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:11:9: error:
named universal character escapes are only valid in C++23
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:11:9: error:
converting UCN to execution character set: Illegal byte sequence
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:13:9: error:
converting UCN to execution character set: Illegal byte sequence
...
You get no diagnostics for those lines at all?  Buggy libconv?
I mean the emojis certainly aren't in ISO-8859-1...

[Bug c++/112652] g++.dg/cpp26/literals2.C FAILs

2023-11-21 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Rainer Orth  changed:

   What|Removed |Added

   Target Milestone|--- |14.0