[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2020-02-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |9.0

--- Comment #5 from Jonathan Wakely  ---
I no longer plan to backport this to the gcc-8 branch, so closing as fixed.

[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2019-06-17 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

--- Comment #4 from Jonathan Wakely  ---
Author: redi
Date: Mon Jun 17 15:03:46 2019
New Revision: 272389

URL: https://gcc.gnu.org/viewcvs?rev=272389&root=gcc&view=rev
Log:
PR libstdc++/90281 Fix string conversions for filesystem::path

Fix several bugs in the encoding conversions for filesystem::path that
prevent conversion of Unicode characters outside the Basic Multilingual
Plane, and prevent returning basic_string specializations with
alternative allocator types.

The std::codecvt_utf8 class template is not suitable for UTF-16
conversions because it uses UCS-2 instead. For conversions between UTF-8
and UTF-16 either std::codecvt or
codecvt_utf8_utf16 must be used.

The __str_codecvt_in and __str_codecvt_out utilities do not
return false on a partial conversion (e.g. for invalid or incomplete
Unicode input). Add new helpers that treat partial conversions as
errors, and use them for all filesystem::path conversions.

PR libstdc++/90281 Fix string conversions for filesystem::path
* include/bits/fs_path.h (u8path) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]:
Use codecvt_utf8_utf16 instead of codecvt_utf8. Use
__str_codecvt_in_all to fail for partial conversions and throw on
error.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS && _GLIBCXX_USE_CHAR8_T]
(path::_Cvt): Add explicit specialization.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Remove
overloads.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
if-constexpr instead of dispatching to _S_wconvert. Use codecvt
instead of codecvt_utf8. Use __str_codecvt_in_all and
__str_codecvt_out_all.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
(path::_S_str_convert) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
with allocator. Use __str_codecvt_out_all. Fallthrough to POSIX code
after converting to UTF-8.
(path::_S_str_convert): Use codecvt instead of codecvt_utf8. Use
__str_codecvt_in_all.
(path::string): Fix initialization of string types with different
allocators.
(path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
* include/bits/locale_conv.h (__do_str_codecvt): Reorder static and
runtime conditions.
(__str_codecvt_out_all, __str_codecvt_in_all): New functions that
return false for partial conversions.
* include/experimental/bits/fs_path.h (u8path):
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Implement correctly for mingw.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Add
missing handling for char8_t. Use codecvt and codecvt_utf8_utf16
instead of codecvt_utf8. Use __str_codecvt_in_all and
__str_codecvt_out_all.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
(path::string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
with allocator. Use __str_codecvt_out_all and __str_codecvt_in_all.
(path::string) [!_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
__str_codecvt_in_all.
(path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
* src/c++17/fs_path.cc (path::_S_convert_loc): Use
__str_codecvt_in_all.
* src/filesystem/path.cc (path::_S_convert_loc): Likewise.
* testsuite/27_io/filesystem/path/construct/90281.cc: New test.
* testsuite/27_io/filesystem/path/factory/u8path.cc: New test.
* testsuite/27_io/filesystem/path/native/string.cc: Test with empty
strings and with Unicode characters outside the basic multilingual
plane.
* testsuite/27_io/filesystem/path/native/alloc.cc: New test.
* testsuite/experimental/filesystem/path/construct/90281.cc: New test.
* testsuite/experimental/filesystem/path/factory/u8path.cc: New test.
* testsuite/experimental/filesystem/path/native/alloc.cc: New test.
* testsuite/experimental/filesystem/path/native/string.cc: Test with
empty strings and with Unicode characters outside the basic
multilingual plane.

Added:
   
branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90281.cc
branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/factory/
   
branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/factory/u8path.cc
  - copied, changed from r272374,
branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/native/string.cc
   
branches/gcc-9-branch/l

[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2019-06-17 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

--- Comment #3 from Jonathan Wakely  ---
Author: redi
Date: Mon Jun 17 14:19:04 2019
New Revision: 272385

URL: https://gcc.gnu.org/viewcvs?rev=272385&root=gcc&view=rev
Log:
PR libstdc++/90281 Fix string conversions for filesystem::path

Fix several bugs in the encoding conversions for filesystem::path that
prevent conversion of Unicode characters outside the Basic Multilingual
Plane, and prevent returning basic_string specializations with
alternative allocator types.

The std::codecvt_utf8 class template is not suitable for UTF-16
conversions because it uses UCS-2 instead. For conversions between UTF-8
and UTF-16 either std::codecvt or
codecvt_utf8_utf16 must be used.

The __str_codecvt_in and __str_codecvt_out utilities do not
return false on a partial conversion (e.g. for invalid or incomplete
Unicode input). Add new helpers that treat partial conversions as
errors, and use them for all filesystem::path conversions.

PR libstdc++/90281 Fix string conversions for filesystem::path
* include/bits/fs_path.h (u8path) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]:
Use codecvt_utf8_utf16 instead of codecvt_utf8. Use
__str_codecvt_in_all to fail for partial conversions and throw on
error.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS && _GLIBCXX_USE_CHAR8_T]
(path::_Cvt): Add explicit specialization.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Remove
overloads.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
if-constexpr instead of dispatching to _S_wconvert. Use codecvt
instead of codecvt_utf8. Use __str_codecvt_in_all and
__str_codecvt_out_all.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
(path::_S_str_convert) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
with allocator. Use __str_codecvt_out_all. Fallthrough to POSIX code
after converting to UTF-8.
(path::_S_str_convert): Use codecvt instead of codecvt_utf8. Use
__str_codecvt_in_all.
(path::string): Fix initialization of string types with different
allocators.
(path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
* include/bits/locale_conv.h (__do_str_codecvt): Reorder static and
runtime conditions.
(__str_codecvt_out_all, __str_codecvt_in_all): New functions that
return false for partial conversions.
* include/experimental/bits/fs_path.h (u8path):
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Implement correctly for mingw.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Add
missing handling for char8_t. Use codecvt and codecvt_utf8_utf16
instead of codecvt_utf8. Use __str_codecvt_in_all and
__str_codecvt_out_all.
[!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
(path::string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
with allocator. Use __str_codecvt_out_all and __str_codecvt_in_all.
(path::string) [!_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
__str_codecvt_in_all.
(path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
* src/c++17/fs_path.cc (path::_S_convert_loc): Use
__str_codecvt_in_all.
* src/filesystem/path.cc (path::_S_convert_loc): Likewise.
* testsuite/27_io/filesystem/path/construct/90281.cc: New test.
* testsuite/27_io/filesystem/path/factory/u8path.cc: New test.
* testsuite/27_io/filesystem/path/native/string.cc: Test with empty
strings and with Unicode characters outside the basic multilingual
plane.
* testsuite/27_io/filesystem/path/native/alloc.cc: New test.
* testsuite/experimental/filesystem/path/construct/90281.cc: New test.
* testsuite/experimental/filesystem/path/factory/u8path.cc: New test.
* testsuite/experimental/filesystem/path/native/alloc.cc: New test.
* testsuite/experimental/filesystem/path/native/string.cc: Test with
empty strings and with Unicode characters outside the basic
multilingual plane.

Added:
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90281.cc
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/factory/
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/factory/u8path.cc
  - copied, changed from r272381,
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/native/string.cc
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/native/alloc.cc
   
trunk/libstdc++-

[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2019-04-30 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #2 from Jonathan Wakely  ---
Patch posted: https://gcc.gnu.org/ml/gcc-patches/2019-04/msg01242.html

[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2019-04-29 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

--- Comment #1 from Jonathan Wakely  ---
The problem is that I'm using codecvt_utf8, which converts between
UTF-8 and UCS-2 (not UTF-16). The U+1D11E is outside the basic multilingual
plane, so is not valid UCS-2.

I need to use a different codecvt facet for UTF-16.

[Bug libstdc++/90281] utf-8 encoded std::filesystem::path can not be converted to utf-16.

2019-04-29 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90281

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-04-29
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Ever confirmed|0   |1