[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #47 from GCC Commits --- The releases/gcc-13 branch has been updated by Jonathan Yong : https://gcc.gnu.org/g:19da6d2d0048eb6a260a5cf8af707cb455848bfb commit r13-8107-g19da6d2d0048eb6a260a5cf8af707cb455848bfb Author: Costas Argyris Date: Mon Nov 20 17:58:16 2023 + mingw: Exclude utf8 manifest [PR70, PR108865] Make the utf8 manifest optional (on by default and explicitly off with --disable-win32-utf8-manifest) in the mingw hosts. Also eliminate duplication between the 32-bit and 64-bit mingw hosts by putting them both in the same branch and special-case only the 64-bit long long setting. PR mingw/70 PR mingw/108865 Signed-off-by: Costas Argyris Signed-off-by: Jonathan Yong <10wa...@gmail.com> gcc/Changelog: * configure.ac: Handle new --enable-win32-utf8-manifest option. * config.host: allow win32 utf8 manifest to be disabled by user. * configure: Regenerate. (cherry picked from commit 4f1ebd54380e16927cd0085be939165870354eac)
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #46 from CVS Commits --- The master branch has been updated by Jonathan Yong : https://gcc.gnu.org/g:4f1ebd54380e16927cd0085be939165870354eac commit r14-5768-g4f1ebd54380e16927cd0085be939165870354eac Author: Costas Argyris Date: Mon Nov 20 17:58:16 2023 + mingw: Exclude utf8 manifest [PR70, PR108865] Make the utf8 manifest optional (on by default and explicitly off with --disable-win32-utf8-manifest) in the mingw hosts. Also eliminate duplication between the 32-bit and 64-bit mingw hosts by putting them both in the same branch and special-case only the 64-bit long long setting. PR mingw/70 PR mingw/108865 Signed-off-by: Costas Argyris Signed-off-by: Jonathan Yong <10wa...@gmail.com> gcc/Changelog: * configure.ac: Handle new --enable-win32-utf8-manifest option. * config.host: allow win32 utf8 manifest to be disabled by user. * configure: Regenerate.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #45 from Costas Argyris --- Created attachment 56653 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56653=edit Introduce configure option --disable-win32-utf8-manifest Thanks for the pointers. I attach a patch that disables the utf8 manifest with the configure option --disable-win32-utf8-manifest To prevent (even more) duplication between the two mingw host branches, I merged them into one since most of their settings are identical. I tested this by building gcc natively on x86_64-w64-mingw32 via MSYS2 (with and without --disable-win32-utf8-manifest), but would appreciate it if the people who reported the problems due to the utf8 manifest also tested it for their use-cases.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Alexandre Oliva changed: What|Removed |Added CC||aoliva at gcc dot gnu.org --- Comment #44 from Alexandre Oliva --- All configure --with-* and --enable-* options are stored in shell variables named with_* and enable_*, respectively, so it's just a matter of testing for yes (for --enable) or rather for no (for explicit --disable). Look for e.g. --enable-initfini-array around line 1932 in $top_srcdir/gcc/configure.ac, or with_avrlibc around line 1495 in gcc/config.gcc; you can do something similar in gcc/config.host, without any explicit argument passing, because the config.{host,gcc} files are sourced by configure, so they inherit all shell variables.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #43 from Eric Botcazou --- > Looks like what is being requested here is a windows-host-specific > configuration option similar to the existing --disable-win32-registry, like > for example --disable-win32-utf8-manifest with its corresponding > --enable-win32-utf8-manifest (default). Yes, this sounds like a good idea. > Then the question is what can be done in configure.ac to raise some sort of > flag that can be picked up from gcc/config.host, which is where the utf8 > resource objects get added.According to its own doc, config.host is > invoked by configure, so it should be possible to pass a simple flag from > configure to config.host.Perhaps setting a shell variable that can be > checked inside config.host, like ${ENABLE_WIN32_UTF8_MANIFEST}, and pull in > the utf-8 files only if that is true. Yes, see the list of variables documented at the beginning of config.host for examples.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #42 from Costas Argyris --- Looks like what is being requested here is a windows-host-specific configuration option similar to the existing --disable-win32-registry, like for example --disable-win32-utf8-manifest with its corresponding --enable-win32-utf8-manifest (default). win32-registry is handled in gcc/configure.ac https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/configure.ac;h=d0caf820648e791272e91ac3eb14a62d034e8629;hb=HEAD#l2335 Then the question is what can be done in configure.ac to raise some sort of flag that can be picked up from gcc/config.host, which is where the utf8 resource objects get added.According to its own doc, config.host is invoked by configure, so it should be possible to pass a simple flag from configure to config.host.Perhaps setting a shell variable that can be checked inside config.host, like ${ENABLE_WIN32_UTF8_MANIFEST}, and pull in the utf-8 files only if that is true.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #41 from Eric Botcazou --- > I'm curious though: How come it took so long to report this one?Is this > a rarely-used feature of the Ada compiler?It seems strange that a > feature of the compiler would interact so strongly with the active code page > being used by the compiler process at runtime. Yes, it's essentially there only to pass ACATS (the infamous c250002 test).
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #40 from Costas Argyris --- (In reply to Eric Botcazou from comment #39) > FWIW this also breaks the GNAT_CODE_PAGE feature of the Ada compiler (which > is arguably a kludge) so providing a configure option to revert to the old > setting would indeed be in order here. I can try coming up with a patch to introduce a new windows-host-specific configure option to exclude the utf8 manifest on demand, but that won't be for a while... I'm curious though: How come it took so long to report this one?Is this a rarely-used feature of the Ada compiler?It seems strange that a feature of the compiler would interact so strongly with the active code page being used by the compiler process at runtime.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Eric Botcazou changed: What|Removed |Added CC||ebotcazou at gcc dot gnu.org --- Comment #39 from Eric Botcazou --- FWIW this also breaks the GNAT_CODE_PAGE feature of the Ada compiler (which is arguably a kludge) so providing a configure option to revert to the old setting would indeed be in order here.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #38 from LIU Hao --- (In reply to Andrew Pinski from comment #35) > (In reply to peter0x44 from comment #34) > > Unfortunately, this option breaks GCC running under Windows XP. > > XP has not been supported by mingw for a long time so I have no idea how you > have been building there. In the last year, some efforts have been made to ensure that the mingw-w64 CRT no longer references symbols that did not exist on XP (e.g. to provide our alternative when there was no `llabs()`). While I didn't test it (I do not have XP installed on any device) it was said to make GNU toolchains produce executables that run on XP. I'd say XP is still sort of 'supported'. However one still has to take care, for example, not pass `%lld` to `printf()`.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #37 from peter0x44 at disroot dot org --- Sorry, comment got sent by accident, before I was done typing. And you replied before I finished, too. Thanks for pointing me in the correct direction. === IRRELEVANT Executables with this embedded UTF-8 manifest don't run on Windows XP. The OS just reports "The system cannot execute the specified program." Patching GCC to remove it does make it work and run fine, simply remove all the contents of winnt-utf8.manifest, and it will work again. Related issue: https://github.com/skeeto/w64devkit/issues/58 I'm not sure if anyone cares, since it is a very old and unsupported OS, but I figured it was worth a little further discussion. Perhaps making disabling it a configure option would be feasible. https://nullprogram.com/blog/2020/05/04/ https://nullprogram.com/blog/2021/12/30/ Here is some useful info that may help in choosing a more optimal solution. ===
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #36 from Andrew Pinski --- (In reply to Andrew Pinski from comment #35) > (In reply to peter0x44 from comment #34) > > Unfortunately, this option breaks GCC running under Windows XP. > > XP has not been supported by mingw for a long time so I have no idea how you > have been building there. Anyways this was reported as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70 already.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #35 from Andrew Pinski --- (In reply to peter0x44 from comment #34) > Unfortunately, this option breaks GCC running under Windows XP. XP has not been supported by mingw for a long time so I have no idea how you have been building there.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 peter0x44 at disroot dot org changed: What|Removed |Added CC||peter0x44 at disroot dot org --- Comment #34 from peter0x44 at disroot dot org --- Unfortunately, this option breaks GCC running under Windows XP.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #33 from Costas Argyris --- It should be noted that with the current implementation, windres (part of binutils) is mandatory when building for the mingw (Windows) hosts, both 32 and 64-bit versions. That is, a build failure will occur if windres is not found for the mingw hosts. This means that for these hosts, gcc will *always* be built with UTF-8 as its active code page on Windows, thereby eliminating the need to have a way to query the active code page as a user. If for example, it could be built either with or without windres, then the active code page would also be conditional on that, so users would need a way to tell what is the active code page being used by a given gcc.exe or g++.exe executable.By having windres be a mandatory build tool for the mingw hosts, this is not a requirement because the answer will always be UTF-8 (otherwise the build would have failed). This is all relevant for gcc 13 or later (as per Target Milestone above) and a minimum Windows Version 1903 (May 2019 Update).If gcc is 13 or later but Windows version is earlier than the minimum target version, gcc will not be using UTF-8 as its active code page on its own - it will still be possible to make it though by applying the UTF-8 manifest with mt.exe manually, or by checking the Windows checkbox that sets UTF-8 globally.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #32 from Costas Argyris --- Followed by: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e70e36cbef4f01e7d32bafe17698c3bf3e4624b8
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #31 from Costas Argyris --- This was initially done only for the 64-bit mingw Windows host (x86_64-*-mingw*). This is the patch that extended it to the 32-bit version as well (i[34567]86-*-mingw32*): https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=304c7d44a2212e6fd618587331cea2c266dc10bf
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Richard Biener changed: What|Removed |Added CC||jdx at o2 dot pl --- Comment #30 from Richard Biener --- *** Bug 109188 has been marked as a duplicate of this bug. ***
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #29 from Costas Argyris --- patch that makes symbol optional was pushed to master: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=08ef17c75777ef9e4e7ead132ccd7a6d03ae6020
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Costas Argyris changed: What|Removed |Added Target|x86_64-w64-mingw32 | --- Comment #28 from Costas Argyris --- Patch also worked for 109188 (should be duplicate of this I think), as confirmed by the original reporter. Shall we move ahead with this patch then?
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #27 from Costas Argyris --- Good to hear. FYI, the driver programs (gcc.exe and g++.exe) should also have it (they do in my builds, both native and cross).
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #26 from LIU Hao --- (In reply to Costas Argyris from comment #23) > Created attachment 54730 [details] > Make symbol optional > > Could you please try this patch? Works for me. I have checked that cpp.exe, cc1.exe, cc1plus.exe all contain the desired UTF-8 manifest in their resources.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #25 from Costas Argyris --- Some more specific info: Host x86_64-w64-mingw32 in general didn't fail.What failed was building it as an MSYS2 package using the PKGBUILD script.For example, cross-compiling with standard configure + make didn't fail. On the reason of the MSYS2 package build failure: When building using that approach, the following executables build/genchecksum.exebuild/genmodes.exebuild/genversion.exe build/gengenrtl.exe are borrowing the $(COMPILERS) flags, so this included -Wl,--require-defined=HOST_EXTRA_OBJS_SYMBOL because of +$(COMPILERS) : override LDFLAGS += -Wl,--require-defined=HOST_EXTRA_OBJS_SYMBOL Since '--require-defined' results in an error if the symbol is not found, the failure happens.This shouldn't be an error though, because this flag was only meant for the compilers, hence the $(COMPILERS) variable.I don't know why these executables use the compiler flags in this build setup.It didn't happen when cross-compiling using configure + make. The proposed patch simply switches '--require-defined' to '--undefined' and makes the symbol definition optional, so these executables don't fail to build. The compilers will still pull it in, so we still get the UTF-8 feature. With the proposed patch, the MSYS2 gcc package builds fine.Also confirmed for the cross-compilation case with configure + make.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #24 from LIU Hao --- (In reply to Costas Argyris from comment #23) > Created attachment 54730 [details] > Make symbol optional > > Could you please try this patch? Didn't test this completely, but it did allow the build to continue. The error was caused by the fact that `sym-mingw32.o` was not built. Also the variable in `sym-mingw32.cc` had better have `extern "C"`.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #23 from Costas Argyris --- Created attachment 54730 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54730=edit Make symbol optional Could you please try this patch?
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 LIU Hao changed: What|Removed |Added CC||lh_mouse at 126 dot com --- Comment #22 from LIU Hao --- This causes x86_64-w64-mingw32 to fail: ``` C:/MSYS2/mingw64/lib/gcc/x86_64-w64-mingw32/13.0.1/../../../../x86_64-w64-mingw32/bin/ld.exe: required symbol `HOST_EXTRA_OBJS_SYMBOL' not defined collect2.exe: error: ld returned 1 exit status ```
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED Target Milestone|--- |13.0 --- Comment #21 from Andrew Pinski --- Fixed.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #20 from CVS Commits --- The master branch has been updated by Jonathan Yong : https://gcc.gnu.org/g:d11e088210a551235d3937f867ee1c8b19d02290 commit r13-6552-gd11e088210a551235d3937f867ee1c8b19d02290 Author: Costas Argyris Date: Tue Feb 28 17:10:18 2023 + Enable UTF-8 code page on Windows 64-bit host [PR108865] Compile a resource object that contains the utf8 manifest. Then link that object into the driver and compiler proper. For compiler proper the link has to be forced because the resource object file gets into a static library (libbackend.a) and gets eventually dropped because it has no symbols of its own and nothing is referencing it inside the library. Therefore, an artificial symbol is planted to force the link. gcc/ChangeLog: PR driver/108865 * config.host: add object for x86_64-*-mingw*. * config/i386/sym-mingw32.cc: dummy file to attach symbol. * config/i386/utf8-mingw32.rc: windres resource file. * config/i386/winnt-utf8.manifest: XML manifest to enable UTF-8. * config/i386/x-mingw32: reference to x-mingw32-utf8. * config/i386/x-mingw32-utf8: Makefile fragment to embed UTF-8 manifest. Signed-off-by: Jonathan Yong <10wa...@gmail.com>
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Costas Argyris changed: What|Removed |Added Attachment #54589|0 |1 is obsolete|| --- Comment #19 from Costas Argyris --- Created attachment 54594 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54594=edit Improve patch structure This improves the modularity of the patch by isolating the utf8-related instructions into their own separate makefile fragment 'x-mingw32-utf8' which gets referenced as an additional host_xmake_file, leaving the main x-mingw32 file intact (well, except a minor irrelevant opportunistic cleanup).
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #18 from Andrew Pinski --- (In reply to Costas Argyris from comment #17) > Created attachment 54589 [details] > Link utf8 resource object in both driver and compiler > > The proposed patch addresses the issue of the resource (utf8) object file > getting dropped from the compiler link due to not having any references to > it inside libbackend.a, by creating an artificial symbol and requiring it to > be present in the compiler link. > > More precisely, a dummy object file is created with the dummy symbol, and > that object file gets combined with the resource object into a third object > file that has both the utf8 resource and the symbol. Neat trick. > > That way, by requiring the symbol to be defined at compiler link we are > forcing the resource object to be linked into the compiler proper. > > The driver has no such issue as it doesn't link to the resource object > through a static library, but as an object directly (this was already > working from the previous patch). > > This now works end-to-end, since I was able to fully use both gcc and g++ > with unicode paths and it succeeded.So both drivers (gcc.exe and > g++.exe) and compilers (cc1.exe and cc1plus.exe) handled the unicode paths > just fine. That is good news.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Costas Argyris changed: What|Removed |Added Attachment #54559|0 |1 is obsolete|| --- Comment #17 from Costas Argyris --- Created attachment 54589 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54589=edit Link utf8 resource object in both driver and compiler The proposed patch addresses the issue of the resource (utf8) object file getting dropped from the compiler link due to not having any references to it inside libbackend.a, by creating an artificial symbol and requiring it to be present in the compiler link. More precisely, a dummy object file is created with the dummy symbol, and that object file gets combined with the resource object into a third object file that has both the utf8 resource and the symbol. That way, by requiring the symbol to be defined at compiler link we are forcing the resource object to be linked into the compiler proper. The driver has no such issue as it doesn't link to the resource object through a static library, but as an object directly (this was already working from the previous patch). This now works end-to-end, since I was able to fully use both gcc and g++ with unicode paths and it succeeded.So both drivers (gcc.exe and g++.exe) and compilers (cc1.exe and cc1plus.exe) handled the unicode paths just fine.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #16 from Andrew Pinski --- (In reply to Costas Argyris from comment #15) > Sounds like I am hitting a separate existing limitation that has nothing to > do with this bug. > > Do we need a new bug report for that one then? No one bug report is enough really in this case. It should not be hard to come up with a secondary patch which fixes that issue. I might give it a go in a few weeks if someone has not already.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #15 from Costas Argyris --- Sounds like I am hitting a separate existing limitation that has nothing to do with this bug. Do we need a new bug report for that one then? FWIW, gcc/config.host wasn't doing anything with host_extra_objs before this patch (it was simply empty), so it makes sense that this issue was hidden until now. host_extra_objs is being used here https://github.com/gcc-mirror/gcc/blob/master/gcc/configure#L12796 https://github.com/gcc-mirror/gcc/blob/master/gcc/configure.ac#L1843 What kind of work would be required to fix it?Can we get away with simply creating a dummy reference that would force the linker to include it?
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #14 from Andrew Pinski --- So the problem is host_extra_objs gets included in libbackend.a but since nothing references it inside the static library, it does not get linked into the cc1 ... Looks like other changes are needed to fix host_extra_objs issue here ...
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #13 from Costas Argyris --- With the changes in the attached patch, the utf8 object file gets linked into gcc.exe but not cc1.exe - How can I achieve this?Basically this object file has to be linked pretty much in every executable as far as I can tell - we want all of them to use the same encoding (although only those that take user-provided paths are really necessary). gcc/config.host says: # host_extra_objs List of extra host-dependent objects that should # be linked into the compiler proper. # # host_extra_gcc_objs List of extra host-dependent objects that should # be linked into the gcc driver. As seen in the patch, I added the new .o file in both of these variables. The driver certainly took it, but cc1 did not.From that description I take it that host_extra_objs should have done it, no? I looked into the libcc1 folder but there is no config.host file there, or anything that looks like it enables host-specific configuration. Any thoughts on how I can extend the scope of where the new object file gets linked in, to cover at least cc1 and possibly more?
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #12 from Costas Argyris --- Sent email to binutils about possible windres issue/limitation: https://sourceware.org/pipermail/binutils/2023-March/126361.html
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #11 from Costas Argyris --- Capturing another data point: I hex-compared an executable before and after applying the UTF-8 manifest with mt.exe just to try and see what it does, and I noticed a few things: 1) The executable size was almost cut in half.It was compiled with gcc in the first place so maybe it stripped the debug symbols perhaps? 2) There were various changes throughout it, not just the part of the xml manifest that was embedded. 3) The format changed to the point that gdb was no longer able to understand it.Before applying the manifest with mt.exe, I was able to load it into gdb and debug it, not after though as it complains about unknown format. Of course, it still runs fine after those changes. So clearly mt.exe does a lot more to the executable than just link in the manifest.I would expect that something similar happens when the manifest gets integrated at build time with MSVC. It doesn't look like it's a simple "just compile and link the resource file" case, as that seems to be only part of what is necessary, not the entire procedure. So with my current understanding of the situation I think that the plan of integrating the UTF-8 manifest at gcc build time with GNU tools is simply not possible. I would love to be proven wrong, but if it's not happening we either have to go for another approach, or just accept that gcc will not support Unicode paths on Windows (in which case we could at least copy the instructions to do it with mt.exe in some wiki guide).
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #10 from Costas Argyris --- The only interesting bit I found there was the shell script that gets called before actually running windres: https://github.com/jbruchon/jdupes/blob/master/Makefile#L201 which is doing some setup: https://github.com/jbruchon/jdupes/blob/master/tune_winres.sh It is changing some version information, but I don't think this is relevant to the problem I am having. Actually to be more precise, what I did did not fail completely, which makes this even stranger:I have a custom tool I created a while ago that you pass it the path to a Windows executable and it tells you the active code page it is using, and this tool actually reports the correct UTF-8 code page when I use the patch I posted.So it looks like it worked at first, but the arguments passed to the executable are still destroyed before main has a chance to do anything with them. It is like the executable itself is successfully converted to use UTF-8, but the setup done by the OS before reaching the entry point (main) hasn't been done properly, so the args never reach main properly.I suspect this is the part that the ms tools do that we don't. It makes some sense because on this particular problem, it is the arguments passed to the program that matter as well, not only the program itself. Perhaps the ms tools do some more work on the executable (besides just linking in the manifest) that signify to the OS loader that the args passed to it must also be interpreted as UTF-8.If such a thing is happening, our linking of the object resource file would never accomplish that I think. On another note, that program doesn't need to use the UTF-8 manifest because apparently it is using the wmain approach to get UTF-16 wide strings and converts them to char-based UTF-8, which wasn't a very good solution for gcc due to impact on the rest of the programs it spawns: https://github.com/jbruchon/jdupes/blob/master/jody_win_unicode.c
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #9 from Andrew Pinski --- https://github.com/jbruchon/jdupes Suggest this is definitely possible. That program includes a manifest that say that the program supports long file names.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #8 from Andrew Pinski --- (In reply to Costas Argyris from comment #7) > I couldn't find examples online for doing this.There are examples of > compiling and linking resource files in general using GNU tools, but not a > resource file that just references a manifest xml file.So unless someone > has some deeper knowledge on how to do this, I seem to be blocked atm. Thanks for doing this at least this far. At least we know the next step/fix really and hopefully someone else can figure out the correct change to do the rest of the way.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #7 from Costas Argyris --- I think the problem is that the embedding of the manifest into the executable is a very low-level process that depends on ms specifics that mt.exe (or VS) knows about and windres + link doesn't. For example, by inspecting an executable patched with mt.exe through a hex editor, one can see that there is some padding involved.This is mentioned here: http://www.vbaccelerator.com/home/VB/Code/Libraries/XP_Visual_Styles/Using_XP_Visual_Styles_in_VB/article.asp "For some bizarre reason, you must also ensure that the resulting XML file is an even multiple of 4 bytes long. So for example, if your file is actually 597 bytes you need to add padding spaces to make up the file size to 600 bytes before compiling." However, even after doing that I wasn't able to get it to work.I think the proper ms way does more to the binary than just embed the manifest and pad, which is not done by the windres + link approach. I couldn't find examples online for doing this.There are examples of compiling and linking resource files in general using GNU tools, but not a resource file that just references a manifest xml file.So unless someone has some deeper knowledge on how to do this, I seem to be blocked atm.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #6 from Costas Argyris --- Created attachment 54559 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54559=edit Integrate UTF-8 manifest into gcc's build process for mingw host This builds fine and the resource object does get linked into the final gcc.exe, however, it still doesn't work - still breaks when fed with a Unicode source path.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-02-25 --- Comment #5 from Andrew Pinski --- .
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #4 from Costas Argyris --- Using the manifest approach described in: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page it is possible to convert a full existing gcc + mingw-w64 toolchain (all executables) to use UTF-8 as their active code page. The proper solution would be to integrate the UTF-8 manifest into gcc's own build process.Until that happens, and to enable existing installations to work with Unicode paths, this is the procedure to convert an existing gcc (mingw-w64) installation on Windows to use the UTF-8 code page. Requirements: 1) See above link to check if your version of Windows supports this. 2) You must have the manifest tool mt.exe installed and know its location. 3) Go to a temp dir and create a 'utf8_acp_setting.manifest' file with this content: http://schemas.microsoft.com/SMI/2019/WindowsSettings;>UTF-8 Assume that the current installation is at C:\mingw64.We are going to create a copy of it in C:\mingw64-UTF8 and apply the UTF-8 manifest in every executable using mt.exe. Add the folder of mt.exe to the path, for example set PATH=C:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64;%PATH% ('where mt' should find it) Copy the entire C:\mingw64 directory to C:\mingw64-UTF8 from the UI or using robocopy C:\mingw64 C:\mingw64-UTF8 /e Cd into the folder where utf8_acp_setting.manifest is and run: for /F %f in ('dir /B /S C:\mingw64-UTF8\*.exe') do mt "-outputresource:%f;1" -manifest "utf8_acp_setting.manifest" After this, the toolchain under C:\mingw64-UTF8 should be able to compile the file that was previously failing. Make sure that you add C:\mingw64-UTF8\bin to the path instead of C:\mingw64\bin set PATH=C:\mingw64-UTF8\bin;%PATH% and check with 'where gcc' - it should return the one under C:\mingw64-UTF8\bin Now compile the file in the Unicode path that was previously failing: C:\Users\cargyris\temp>gcc ﹏\src.c C:\Users\cargyris\temp>echo %errorlevel% 0 no errors this time.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #3 from Andrew Pinski --- (In reply to Costas Argyris from comment #2) > (In reply to Andrew Pinski from comment #1) > > Utf8 is the best generic solution really. > > Using wmain is not very portable and the rest of gcc's sources can't use > > wchar_t as that would break unix/Linux handling. > > Yes, on that, I was thinking to only use wchar_t in wmain just to get the > arguments properly (not destroyed), and immediately convert to UTF-8 char > arrays to pass to the rest of the program (starting with the call to > driver.main which main wraps).That way, all sources would stay the same > working with char arrays, only this time it would be UTF-8 char arrays that > properly carry the Unicode args.This would allow only selected parts of > the Windows-specific code (possibly only in libiberty/pex-win32.c) to opt-in > for the necessary conversion back to wchar_t UTF-16 arrays in order to call > the Unicode versions of Win32 APIs like CreateProcessW etc., and get > end-to-end Unicode support on Windows. I think that is bad solution in general. Just use utf8 like every other target would.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #2 from Costas Argyris --- (In reply to Andrew Pinski from comment #1) > Utf8 is the best generic solution really. > Using wmain is not very portable and the rest of gcc's sources can't use > wchar_t as that would break unix/Linux handling. Yes, on that, I was thinking to only use wchar_t in wmain just to get the arguments properly (not destroyed), and immediately convert to UTF-8 char arrays to pass to the rest of the program (starting with the call to driver.main which main wraps).That way, all sources would stay the same working with char arrays, only this time it would be UTF-8 char arrays that properly carry the Unicode args.This would allow only selected parts of the Windows-specific code (possibly only in libiberty/pex-win32.c) to opt-in for the necessary conversion back to wchar_t UTF-16 arrays in order to call the Unicode versions of Win32 APIs like CreateProcessW etc., and get end-to-end Unicode support on Windows.
[Bug driver/108865] gcc on Windows fails with Unicode path to source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 --- Comment #1 from Andrew Pinski --- Utf8 is the best generic solution really. Using wmain is not very portable and the rest of gcc's sources can't use wchar_t as that would break unix/Linux handling.