Re: [Mingw-w64-public] Encoding problem with __FILE__ macro
Putting source files in anything but ascii folders is asking for trouble. Lots of trouble. Just don't. On Sat, Jan 11, 2014 at 2:38 PM, lh_mouse wrote: > The problem happens with the encoding of the source file's path, not the > file's contents. > Anyway I agree with you that it is a good habit to code in plain English. But > it is inevitable to involve the file's path in specific situations e.g. when > you use the assert() macro. > > 2014-01-11 > lh_mouse > > > -- > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > ___ > Mingw-w64-public mailing list > Mingw-w64-public@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Encoding problem with __FILE__ macro
The problem happens with the encoding of the source file's path, not the file's contents. Anyway I agree with you that it is a good habit to code in plain English. But it is inevitable to involve the file's path in specific situations e.g. when you use the assert() macro. 2014-01-11 lh_mouse -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
Re: [Mingw-w64-public] Encoding problem with __FILE__ macro
2014/1/6 lh_mouse > Hi guys, I have noticed that in mingw-gcc, the __FILE__ macro expands > using the system's code page encoding (e.g. code page 936 on Simplified > Chinese Windows systems). I will describe how it causes problems: >When a mbcs literal is used in the source file, GCC does not check > whether it is legal or not - it simply performs a bytewise copy, just > opposite to MSVC, which converts UTF8 string literals to code page string > literals. So the following code will work in both mingw-gcc and MSVC, when > saved in ANSI text format: > std::puts("喵"); // "\xDF\xF7" in CP936 > But the following code will NOT work in GCC: > std::fputws(L"喵", stdout); // L"\xDF\xF7" in CP936 >GCC gives this error: > error: converting to execution character set: Illegal byte sequence >I believe this should be due to the encoding. When GCC finds a wide > string literal, it tries to re-encode the string literal from the file into > wide string format. In this progress GCC rejects any mbcs encoding but > UTF8. Converting the source file encoding to UTF8 would solve this problem > but will cause another one: most stdio functions still expects code page > strings and will produce gibberish with UTF8 strings. >The final sollution: avoid narrow string literals in source files, use > wide string literals only. >There is still a problem left: some ISO C macros such as __FILE__ still > use code page encoding - the following code will produce compile errors if > the file's name contains non-ASCII characters: > #define TO_WCS2(x) L##x > #define TO_WCS(x) TO_WCS2(x) >std::fputws(TO_WCS(__FILE__), stdout); >Personally I am considering this is a bug because GCC does not actually > recognizes code page strings. >What do you think about this? > If GCC was properly built with iconv support, you can use the option -finput-charset to set your file's encoding so that it is properly read by the compiler. I'm not quite sure how the __FILE__ macro will be affected. In general, it is always better to write code in plain English, as to avoid these issues ;-) See the manual for more information: http://gcc.gnu.org/onlinedocs/cpp/Invocation.html Cheers, Ruben > > Best regards, > 2014-01-06 > lh_mouse > > > -- > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > ___ > Mingw-w64-public mailing list > Mingw-w64-public@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public > -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
[Mingw-w64-public] Encoding problem with __FILE__ macro
Hi guys, I have noticed that in mingw-gcc, the __FILE__ macro expands using the system's code page encoding (e.g. code page 936 on Simplified Chinese Windows systems). I will describe how it causes problems: When a mbcs literal is used in the source file, GCC does not check whether it is legal or not - it simply performs a bytewise copy, just opposite to MSVC, which converts UTF8 string literals to code page string literals. So the following code will work in both mingw-gcc and MSVC, when saved in ANSI text format: std::puts("喵"); // "\xDF\xF7" in CP936 But the following code will NOT work in GCC: std::fputws(L"喵", stdout); // L"\xDF\xF7" in CP936 GCC gives this error: error: converting to execution character set: Illegal byte sequence I believe this should be due to the encoding. When GCC finds a wide string literal, it tries to re-encode the string literal from the file into wide string format. In this progress GCC rejects any mbcs encoding but UTF8. Converting the source file encoding to UTF8 would solve this problem but will cause another one: most stdio functions still expects code page strings and will produce gibberish with UTF8 strings. The final sollution: avoid narrow string literals in source files, use wide string literals only. There is still a problem left: some ISO C macros such as __FILE__ still use code page encoding - the following code will produce compile errors if the file's name contains non-ASCII characters: #define TO_WCS2(x) L##x #define TO_WCS(x) TO_WCS2(x) std::fputws(TO_WCS(__FILE__), stdout); Personally I am considering this is a bug because GCC does not actually recognizes code page strings. What do you think about this? Best regards, 2014-01-06 lh_mouse -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk ___ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public