Re: [Mingw-w64-public] Encoding problem with __FILE__ macro

2014-01-11 Thread Ray Donnelly
Putting source files in anything but ascii folders is asking for
trouble. Lots of trouble. Just don't.

On Sat, Jan 11, 2014 at 2:38 PM, lh_mouse  wrote:
> The problem happens with the encoding of the source file's path, not the 
> file's contents.
> Anyway I agree with you that it is a good habit to code in plain English. But 
> it is inevitable to involve the file's path in specific situations e.g. when 
> you use the assert() macro.
>
> 2014-01-11
> lh_mouse
>
>
> --
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> ___
> Mingw-w64-public mailing list
> Mingw-w64-public@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
___
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public


Re: [Mingw-w64-public] Encoding problem with __FILE__ macro

2014-01-11 Thread lh_mouse
The problem happens with the encoding of the source file's path, not the file's 
contents.
Anyway I agree with you that it is a good habit to code in plain English. But 
it is inevitable to involve the file's path in specific situations e.g. when 
you use the assert() macro.

2014-01-11
lh_mouse


--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
___
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public


Re: [Mingw-w64-public] Encoding problem with __FILE__ macro

2014-01-11 Thread Ruben Van Boxem
2014/1/6 lh_mouse 

> Hi guys, I have noticed that in mingw-gcc, the __FILE__ macro expands
> using the system's code page encoding (e.g. code page 936 on Simplified
> Chinese Windows systems). I will describe how it causes problems:
>When a mbcs literal is used in the source file, GCC does not check
> whether it is legal or not - it simply performs a bytewise copy, just
> opposite to MSVC, which converts UTF8 string literals to code page string
> literals. So the following code will work in both mingw-gcc and MSVC, when
> saved in ANSI text format:
>   std::puts("喵");  // "\xDF\xF7" in CP936
>   But the following code will NOT work in GCC:
>   std::fputws(L"喵", stdout);  // L"\xDF\xF7" in CP936
>GCC gives this error:
>   error: converting to execution character set: Illegal byte sequence
>I believe this should be due to the encoding. When GCC finds a wide
> string literal, it tries to re-encode the string literal from the file into
> wide string format. In this progress GCC rejects any mbcs encoding but
> UTF8. Converting the source file encoding to UTF8 would solve this problem
> but will cause another one: most stdio functions still expects code page
> strings and will produce gibberish with UTF8 strings.
>The final sollution: avoid narrow string literals in source files, use
> wide string literals only.
>There is still a problem left: some ISO C macros such as __FILE__ still
> use code page encoding - the following code will produce compile errors if
> the file's name contains non-ASCII characters:
>   #define TO_WCS2(x)   L##x
>   #define TO_WCS(x)   TO_WCS2(x)
>std::fputws(TO_WCS(__FILE__), stdout);
>Personally I am considering this is a bug because GCC does not actually
> recognizes code page strings.
>What do you think about this?
>

If GCC was properly built with iconv support, you can use the option
-finput-charset to set your file's encoding so that it is properly read by
the compiler. I'm not quite sure how the __FILE__ macro will be affected.
In general, it is always better to write code in plain English, as to avoid
these issues ;-)

See the manual for more information:
http://gcc.gnu.org/onlinedocs/cpp/Invocation.html

Cheers,

Ruben


>
> Best regards,
> 2014-01-06
> lh_mouse
>
>
> --
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> ___
> Mingw-w64-public mailing list
> Mingw-w64-public@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
>
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public


[Mingw-w64-public] Encoding problem with __FILE__ macro

2014-01-06 Thread lh_mouse
Hi guys, I have noticed that in mingw-gcc, the __FILE__ macro expands using the 
system's code page encoding (e.g. code page 936 on Simplified Chinese Windows 
systems). I will describe how it causes problems:
   When a mbcs literal is used in the source file, GCC does not check whether 
it is legal or not - it simply performs a bytewise copy, just opposite to MSVC, 
which converts UTF8 string literals to code page string literals. So the 
following code will work in both mingw-gcc and MSVC, when saved in ANSI text 
format:
  std::puts("喵");  // "\xDF\xF7" in CP936
  But the following code will NOT work in GCC:
  std::fputws(L"喵", stdout);  // L"\xDF\xF7" in CP936
   GCC gives this error:
  error: converting to execution character set: Illegal byte sequence
   I believe this should be due to the encoding. When GCC finds a wide string 
literal, it tries to re-encode the string literal from the file into wide 
string format. In this progress GCC rejects any mbcs encoding but UTF8. 
Converting the source file encoding to UTF8 would solve this problem but will 
cause another one: most stdio functions still expects code page strings and 
will produce gibberish with UTF8 strings.
   The final sollution: avoid narrow string literals in source files, use wide 
string literals only.
   There is still a problem left: some ISO C macros such as __FILE__ still use 
code page encoding - the following code will produce compile errors if the 
file's name contains non-ASCII characters:
  #define TO_WCS2(x)   L##x
  #define TO_WCS(x)   TO_WCS2(x)
   std::fputws(TO_WCS(__FILE__), stdout);
   Personally I am considering this is a bug because GCC does not actually 
recognizes code page strings.
   What do you think about this?

Best regards,
2014-01-06
lh_mouse

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
___
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public