Issue 89676
Summary [libc++] wstring_convert::from_bytes Fails on Identity Conversions
Labels libc++
Assignees
Reporter tommkelly
    It appears that [`wstring_convert::from_bytes(const char*, const char*)`](https://en.cppreference.com/w/cpp/locale/wstring_convert/from_bytes) can fail erroneously--even throwing an exception--when specialized with the Identity Conversion and `Elem` type `char`, that is:
```
std::wstring_convert<std::codecvt<char, char, std::mbstate_t>, char>
```
This came up for me when writing cross-platform code meant to compile on both Windows and Linux. I needed to format file names as input for [`std::filesystem::file_size`](https://en.cppreference.com/w/cpp/filesystem/file_size), so I defined a `wstring_convert` in terms of `std::filesystem::path::value_type`, which should be `wchar_t` on Windows and `char` on Linux.

For the latter: one would expect `from_bytes` to return a `basic_string<char>` exactly equivalent to the input, but instead: the method threw a `range_error` (the expected behavior when `from_bytes` encounters an error and the user hasn't provided an error `wstring`).

I believe the culprit lies [here](https://github.com/llvm/llvm-project/blob/b8ff08d0e668e5397dd799b76ede0bd54fcba75c/libcxx/include/locale#L3225) in **llvm-project/libcxx/include/locale**:
```
__r = __cvtptr_->in(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);
__cvtcount_ += __frm_nxt - __frm;
if (__frm_nxt == __frm) {
  __r = codecvt_base::error;
} else if (__r == codecvt_base::noconv) {
  __ws.resize(__to - &__ws[0]);
  //This only gets executed if _Elem is char
  __ws.append((const _Elem*)__frm, (const _Elem*)__frm_end);
 __frm = __frm_nxt;
  __r   = codecvt_base::ok;
} else if ...
```
>From the [documentation](https://en.cppreference.com/w/cpp/locale/codecvt/in) of `std::codecvt::in`:
> Leaves `from_next` and `to_next` pointing one beyond the last element successfully converted. 
...
If this `codecvt` facet does not define a conversion, no characters are converted. `to_next` is set to be equal to `to`, `state` is unchanged, and [`std::codecvt_base::noconv`](https://en.cppreference.com/w/cpp/locale/codecvt_base) is returned.

This unfortunately doesn't specify the expected value of `from_next` after function execution in the `noconv` case, but one can infer by definition that it at least *may* behave similarly to `to_next`; i.e. `from_next` is set to `from` if `in` returns `std::codecvt_base::noconv`, and my own observations via debugger corroborate this.

In other words: it seems that this implementation of `from_bytes` is circumventing its own `noconv` case by first checking whether `__frm_next == __frm`
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to