[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #26 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #19) > (In reply to Andreas Schwab from comment #10) > > It is a valid preprocessing token ("non-whitespace character that cannot be > > one of the above"). > > Ah right, yes. It's a preprocessing token, but is never converted to a > token, so doesn't need to be a keyword, identifier etc. i feel like it should work for stringification reasons too. e.g. #define X(x) #x const char *letterA = X(A); // this works const char *notequal = X(≠); // this does not
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #24 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #23) > (In reply to Adam Wozniak from comment #20) > > i get this response: > > > > This page contains the following errors: > > error on line 20 at column 54: AttValue: " or ' expected > > Below is a rendering of the page up to the first error. > > That seems to be a problem at your end, the page is well-formed: > https://validator.w3.org/nu/?doc=https%3A%2F%2Fgcc.gnu.org%2Fgit%2Fgitweb. > cgi%3Fp%3Dgcc.git%3Bh%3D7d112d6670a0e0e662 works now. did not before. weird.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #22 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #19) > (In reply to Andreas Schwab from comment #10) > > It is a valid preprocessing token ("non-whitespace character that cannot be > > one of the above"). > > Ah right, yes. It's a preprocessing token, but is never converted to a > token, so doesn't need to be a keyword, identifier etc. Correct.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #21 from Adam Wozniak --- (In reply to Andrew Pinski from comment #16) > It is funny arguing with folks who write parts of GCC on an idea of > integrated vs seperate preprocessor really. yeah, i've been pounding out C since the late 80s, my dinosaur is probably showing. they'll probably call me in 2038 like they called the old COBOL programmers for Y2K. it's weird to me to think of them not separately. i've even used the C preprocessor in contexts unrelated to parsing C code. it's also weird to see someone who thinks of the C preprocessor only in terms of its service to the compiler. whatever, that's drifting off topic. main point for me was, i don't see any other reason to disallow these unicode chars other than "the spec says so". i don't see any HARM in allowing them, and i certainly see use cases where there is BENEFIT to allowing them. not all macro args get turned into C++ identifiers. some get thrown away. some get stringified. in the particular case where i tripped over this, they get thrown away, and i have ANOTHER postprocessing step that picks them up and does other magic stuff with them. also, there's probably a really good case for allowing some of these things, like emoji, actually be allowed as C++ identifiers.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #20 from Adam Wozniak --- (In reply to Andrew Pinski from comment #17) > (In reply to Adam Wozniak from comment #13) > > (In reply to Jakub Jelinek from comment #11) > > > Bisection points to r10-3309-g7d112d6670a0e0e662 > > > > that link gives me an error > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=7d112d6670a0e0e662 > > Does that link work? i get this response: This page contains the following errors: error on line 20 at column 54: AttValue: " or ' expected Below is a rendering of the page up to the first error.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #15 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #6) > ≠ cannot be used in an identifier, and it's none of the other forms either. at the risk of beating a dead horse, what you are saying here is that ≠ simply cannot be used, ever, anywhere, in C/C++. that seems like kind of a waste. a whole raft of unicode characters that simply cannot be used. so much for embracing unicode. Maybe someone wants to name a variable "§32" for some reason, but can't because... why exactly? because the spec says so.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #14 from Adam Wozniak --- (In reply to Adam Wozniak from comment #12) > (In reply to Jonathan Wakely from comment #9) > > (In reply to Adam Wozniak from comment #8) > > > i don't think of the preprocessor as part of the compiler. > > > it's a different step, a different executable, that happens BEFORE the > > > compiler. > > > > No it isn't. Preprocessing is done by the compiler, using libcpp. There is > > no different executable. GCC has worked that way for many, many years. > > > No, i am fairly CERTAIN they are different executables. > > i can even invoke one without the other; /lib/cpp can be invoked directly, > and g++ can be told to skip the preprocessor by renaming your source file > *.i or *.ii. > > $ ls -la /lib/cpp > lrwxrwxrwx 1 root root 21 May 11 2022 /lib/cpp -> /etc/alternatives/cpp > $ ls -la /etc/alternatives/cpp > lrwxrwxrwx 1 root root 12 May 11 2022 /etc/alternatives/cpp -> /usr/bin/cpp > $ ls -la /usr/bin/cpp > lrwxrwxrwx 1 root root 6 May 11 2022 /usr/bin/cpp -> cpp-11 > $ ls -la /usr/bin/cpp-11 > lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 -> > x86_64-linux-gnu-cpp-11 > $ ls -la /usr/bin/x86_64-linux-gnu-cpp-11 > -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11 > $ which g++ > /usr/bin/g++ > $ ls -la /usr/bin/g++ > lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++ > $ ls -la /etc/alternatives/g++ > lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ -> > /usr/bin/g++-11 > $ ls -la /usr/bin/g++-11 > lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 -> > x86_64-linux-gnu-g++-11 > $ ls -la /usr/bin/x86_64-linux-gnu-g++-11 > -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11 lest someone claim they are the same because of identical sizes... $ md5sum /usr/bin/x86_64-linux-gnu-g++-11 f0b26412421754aa03b9457a4d2ee40c /usr/bin/x86_64-linux-gnu-g++-11 $ md5sum /usr/bin/x86_64-linux-gnu-cpp-11 3bddc1f50d7631ad22da0f875babe7a3 /usr/bin/x86_64-linux-gnu-cpp-11
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #13 from Adam Wozniak --- (In reply to Jakub Jelinek from comment #11) > Bisection points to r10-3309-g7d112d6670a0e0e662 that link gives me an error
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #12 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #9) > (In reply to Adam Wozniak from comment #8) > > i don't think of the preprocessor as part of the compiler. > > it's a different step, a different executable, that happens BEFORE the > > compiler. > > No it isn't. Preprocessing is done by the compiler, using libcpp. There is > no different executable. GCC has worked that way for many, many years. No, i am fairly CERTAIN they are different executables. i can even invoke one without the other; /lib/cpp can be invoked directly, and g++ can be told to skip the preprocessor by renaming your source file *.i or *.ii. $ ls -la /lib/cpp lrwxrwxrwx 1 root root 21 May 11 2022 /lib/cpp -> /etc/alternatives/cpp $ ls -la /etc/alternatives/cpp lrwxrwxrwx 1 root root 12 May 11 2022 /etc/alternatives/cpp -> /usr/bin/cpp $ ls -la /usr/bin/cpp lrwxrwxrwx 1 root root 6 May 11 2022 /usr/bin/cpp -> cpp-11 $ ls -la /usr/bin/cpp-11 lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 -> x86_64-linux-gnu-cpp-11 $ ls -la /usr/bin/x86_64-linux-gnu-cpp-11 -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11 $ which g++ /usr/bin/g++ $ ls -la /usr/bin/g++ lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++ $ ls -la /etc/alternatives/g++ lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ -> /usr/bin/g++-11 $ ls -la /usr/bin/g++-11 lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 -> x86_64-linux-gnu-g++-11 $ ls -la /usr/bin/x86_64-linux-gnu-g++-11 -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 --- Comment #8 from Adam Wozniak --- (In reply to Jonathan Wakely from comment #6) > That isn't the point. The compiler has to tokenize the input in order to > perform the preprocessing step. That means it has to be able to decide what > the bytes comprising the ≠ mean. Are they multiple tokens? A single token > consisting of an identifier? A C++ operator? > > The standard says "Each preprocessing token that is converted to a token > (5.6) shall have the lexical form of a keyword, an identifier, a literal, or > an operator or punctuator." > > ≠ cannot be used in an identifier, and it's none of the other forms either. > > > it should be perfectly legal to use these as arguments. > > By that argument, you could say X(£), but that isn't allowed either. > > > note the emoji passes through flawlessly. > > Not with -Wpedantic i would argue that X(£) should also be allowed. i don't think of the preprocessor as part of the compiler. it's a different step, a different executable, that happens BEFORE the compiler. hence the name, PREprocessor. i cannot argue with "the standard", however.
[Bug c++/109936] error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 Adam Wozniak changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED|UNCONFIRMED Version|11.3.0 |12.1.0 --- Comment #4 from Adam Wozniak --- reopening. this is not at all "expected". C++ papers P1041R4 and P1139R2 cover literal constants in code. they do not at all cover anything about arguments to C preprocessor macros. in this case, the macro generates no code. it should be perfectly legal to use these as arguments. note the emoji passes through flawlessly. bug also exists in 12.1.0, so updating "Version".
[Bug c++/109936] New: error: extended character ≠ is not valid in an identifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936 Bug ID: 109936 Summary: error: extended character ≠ is not valid in an identifier Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: adam at wozniakconsulting dot com Target Milestone: --- Created attachment 55138 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55138&action=edit cpp file that demonstrates bug #define X(x) X(🤔) // emojis work X(≠) // this "not equal" does NOT work! /// #if 0 compile with "g++ -c bad.cpp" gives: bad.cpp:3:3: error: extended character ≠ is not valid in an identifier 3 | X(≠) | ^ compile with "g++ -c -fextended-identifiers bad.cpp" gives the same error. g++ --version says: g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. /lib/cpp --version says: cpp (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. manual for both gcc and cpp says: -fextended-identifiers Accept universal character names and extended characters in identifiers. This option is enabled by default for C99 (and later C standard versions) and C++. BTW, i get similar error with the following unicode code points. while some may have reasonable explanations, many do not. 0080 - 00a7 00a9 00ab - 00ac 00ae 00b0 - 00b1 00b6 00bb 00bf 00d7 00f7 0300 - 036f 1680 180e 1dc0 - 1dff 2000 - 200a 200e - 2029 202f - 203e 2041 - 2053 2055 - 205f 20d0 - 20ff 2190 - 245f 2500 - 2775 2794 - 2bff 2e00 - 2e7f 3000 - 3003 3008 - 3020 3030 e000 - f8ff fdd0 - fdef fe20 - fe2f fe45 - fe46 #endif