Re: Do not take address of empty string front
On 22/06/15 16:10 +0100, Jonathan Wakely wrote: On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; This makes that change, and also moves wstring_convert into the ABI-tagged __cxx11 namespace, and fixes a copypaste error in the exception thrown from wbuffer_convert. This is the equivalent patch for the branch. Tested powerpc64le-linux, committed to gcc-5-branch. commit ab6011c5ffbd16f7f3f509f6e9fec6dc9f7daf36 Author: Jonathan Wakely jwak...@redhat.com Date: Wed Jul 1 15:41:33 2015 +0100 * include/bits/locale_conv.h (wstring_convert): Use __cxx11 inline namespace in new ABI. (wstring_convert::_M_conv): Handle empty range. diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h index 9be2866..de49dd5 100644 --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -51,6 +51,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @{ */ +_GLIBCXX_BEGIN_NAMESPACE_CXX11 /// String conversions templatetypename _Codecvt, typename _Elem = wchar_t, typename _Wide_alloc = allocator_Elem, @@ -192,10 +193,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_conv(const _InChar* __first, const _InChar* __last, const _OutStr* __err, _MemFn __memfn) { + auto __outstr = __err ? _OutStr(__err-get_allocator()) : _OutStr(); + + if (__first == __last) + { + _M_count = 0; + return __outstr; + } + if (!_M_with_cvtstate) _M_state = state_type(); - auto __outstr = __err ? _OutStr(__err-get_allocator()) : _OutStr(); size_t __outchars = 0; auto __next = __first; const auto __maxlen = _M_cvt-max_length() + 1; @@ -239,6 +247,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool _M_with_cvtstate = false; bool _M_with_strings = false; }; +_GLIBCXX_END_NAMESPACE_CXX11 /// Buffer conversions templatetypename _Codecvt, typename _Elem = wchar_t, @@ -264,7 +273,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_buf(__bytebuf), _M_cvt(__pcvt), _M_state(__state) { if (!_M_cvt) - __throw_logic_error(wstring_convert); + __throw_logic_error(wbuffer_convert); _M_always_noconv = _M_cvt-always_noconv();
Re: Do not take address of empty string front
On 23/06/15 22:04 +0200, François Dumont wrote: On 22/06/2015 17:10, Jonathan Wakely wrote: On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + Shouldn't it also set __count to 0 to make clear that nothing has been converted. Yes, thanks. Fixed like so. Tested powerpc64le-linux, in debug and normal modes, committed to trunk. commit f02e0e6dfd0ffbef53252aa0b739a9b1ba7391ff Author: Jonathan Wakely jwak...@redhat.com Date: Thu Jun 25 14:51:19 2015 +0100 * include/bits/locale_conv.h (__do_str_codecvt): Set __count. diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h index fd99499..146f78b 100644 --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -61,6 +61,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (__first == __last) { __outstr.clear(); + __count = 0; return true; }
Re: Do not take address of empty string front
On 22/06/2015 17:10, Jonathan Wakely wrote: On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? It would if we were using basic_string debug implementation but we aren't. We are just using normal implementation with some debug checks which is not enough to detect invalid deference operations. Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; This makes that change, and also moves wstring_convert into the ABI-tagged __cxx11 namespace, and fixes a copypaste error in the exception thrown from wbuffer_convert. Tested powerpc64le-linux, committed to trunk. François, your changes to add extra checks in std::string are still useful separately. I just applied attached patch then. François Index: include/bits/basic_string.h === --- include/bits/basic_string.h (revision 224914) +++ include/bits/basic_string.h (working copy) @@ -39,6 +39,7 @@ #include ext/atomicity.h #include ext/alloc_traits.h #include debug/debug.h + #if __cplusplus = 201103L #include initializer_list #endif @@ -903,7 +904,10 @@ */ reference front() noexcept - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read-only (constant) reference to the data at the first @@ -911,7 +915,10 @@ */ const_reference front() const noexcept - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read/write reference to the data at the last @@ -919,7 +926,10 @@ */ reference back() noexcept - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } /** * Returns a read-only (constant) reference to the data at the @@ -927,7 +937,10 @@ */ const_reference back() const noexcept - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } #endif // Modifiers: @@ -1506,7 +1519,10 @@ */ void pop_back() noexcept - { _M_erase(size()-1, 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + _M_erase(size() - 1, 1); + } #endif // C++11 /** @@ -3308,7 +3324,10 @@ */ reference front() - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read-only (constant) reference to the data at the first @@ -3316,7 +3335,10 @@ */ const_reference front() const _GLIBCXX_NOEXCEPT - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read/write reference to the data at the last @@ -3324,7 +3346,10 @@ */ reference back() - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } /** * Returns a read-only (constant) reference to the data at the @@ -3332,7 +3357,10 @@ */ const_reference back() const _GLIBCXX_NOEXCEPT - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } #endif // Modifiers: @@ -3819,7 +3847,10 @@ */ void pop_back() // FIXME C++11: should be noexcept. - { erase(size()-1, 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + erase(size() - 1, 1); + } #endif // C++11 /**
Re: Do not take address of empty string front
On 22/06/2015 17:10, Jonathan Wakely wrote: On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + Shouldn't it also set __count to 0 to make clear that nothing has been converted. size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; This makes that change, and also moves wstring_convert into the ABI-tagged __cxx11 namespace, and fixes a copypaste error in the exception thrown from wbuffer_convert. Tested powerpc64le-linux, committed to trunk. François, your changes to add extra checks in std::string are still useful separately. Ok I will take care of this part. François
Re: Do not take address of empty string front
On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; This makes that change, and also moves wstring_convert into the ABI-tagged __cxx11 namespace, and fixes a copypaste error in the exception thrown from wbuffer_convert. Tested powerpc64le-linux, committed to trunk. François, your changes to add extra checks in std::string are still useful separately. commit 4ab3f0a76f7e18074c91c4644cbfdf23084e93ba Author: Jonathan Wakely jwak...@redhat.com Date: Mon Jun 22 13:47:24 2015 +0100 * include/bits/locale_conv.h (__do_str_codecvt): Handle empty range. (wstring_convert): Move into __cxx11 namespace. (wbuffer_convert(streambuf*, _Codecvt*, state_type)): Fix exception message. diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h index 61b535c..fd99499 100644 --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; @@ -150,6 +156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __str_codecvt_out(__first, __last, __outstr, __cvt, __state, __n); } +_GLIBCXX_BEGIN_NAMESPACE_CXX11 + /// String conversions templatetypename _Codecvt, typename _Elem = wchar_t, typename _Wide_alloc = allocator_Elem, @@ -301,6 +309,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool _M_with_strings = false; }; +_GLIBCXX_END_NAMESPACE_CXX11 + /// Buffer conversions templatetypename _Codecvt, typename _Elem = wchar_t, typename _Tr = char_traits_Elem @@ -325,7 +335,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_buf(__bytebuf), _M_cvt(__pcvt), _M_state(__state) { if (!_M_cvt) - __throw_logic_error(wstring_convert); + __throw_logic_error(wbuffer_convert); _M_always_noconv = _M_cvt-always_noconv();
Do not take address of empty string front
Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. I kept operator as it is what was used, do you think we should use addressof ? At the same time I improve string debug mode cause with front() implemented as operator[](0) it is ok even if string is empty while back() implemented as operator[](size()-1) is not which is quite inconsistent. So I added a number of !empty() checks in several methods to detect more easily all those invalid usages. * include/bits/basic_string.h (basic_string::front()): Add !empty debug check. (basic_string::back()): Likewise. (basic_string::pop_back()): Likewise. * include/bits/locale_env.h (__do_std_codecvt): Do not take address of string::front() when string is empty. * include/debug/macros.h (__glibcxx_check_string, __glibcxx_check_string_len): Implement using _GLIBCXX_DEBUG_PEDASSERT. Tested under Linux x86_64 debug mode. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 093f502..923fb83 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -39,6 +39,7 @@ #include ext/atomicity.h #include ext/alloc_traits.h #include debug/debug.h + #if __cplusplus = 201103L #include initializer_list #endif @@ -903,7 +904,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ reference front() noexcept - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read-only (constant) reference to the data at the first @@ -911,7 +915,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ const_reference front() const noexcept - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read/write reference to the data at the last @@ -919,7 +926,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ reference back() noexcept - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } /** * Returns a read-only (constant) reference to the data at the @@ -927,7 +937,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ const_reference back() const noexcept - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } #endif // Modifiers: @@ -1506,7 +1519,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ void pop_back() noexcept - { _M_erase(size()-1, 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + _M_erase(size() - 1, 1); + } #endif // C++11 /** @@ -3308,7 +3324,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 */ reference front() - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read-only (constant) reference to the data at the first @@ -3316,7 +3335,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 */ const_reference front() const _GLIBCXX_NOEXCEPT - { return operator[](0); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](0); + } /** * Returns a read/write reference to the data at the last @@ -3324,7 +3346,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 */ reference back() - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } /** * Returns a read-only (constant) reference to the data at the @@ -3332,7 +3357,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 */ const_reference back() const _GLIBCXX_NOEXCEPT - { return operator[](this-size() - 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + return operator[](this-size() - 1); + } #endif // Modifiers: @@ -3819,7 +3847,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 */ void pop_back() // FIXME C++11: should be noexcept. - { erase(size()-1, 1); } + { + _GLIBCXX_DEBUG_ASSERT(!empty()); + erase(size() - 1, 1); + } #endif // C++11 /** diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h index 61b535c..8def30d 100644 --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -66,10 +66,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION do { __outstr.resize(__outstr.size() + (__last - __next) * __maxlen); - auto __outnext = __outstr.front() + __outchars; - auto const __outlast = __outstr.back() + 1; + auto __outnext = (*__outstr.begin()) +
Re: Do not take address of empty string front
On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; I kept operator as it is what was used, do you think we should use addressof ? I don't expect that function to get used with anything except built-in character types (char, wchar_t, char16_t, char32_t) which don't use ADL so can't find an overloaded operator. At the same time I improve string debug mode cause with front() implemented as operator[](0) it is ok even if string is empty while back() implemented as operator[](size()-1) is not which is quite inconsistent. So I added a number of !empty() checks in several methods to detect more easily all those invalid usages. Nice. These would all be candidates for the debug mode lite that I want. * include/bits/basic_string.h (basic_string::front()): Add !empty debug check. (basic_string::back()): Likewise. (basic_string::pop_back()): Likewise. * include/bits/locale_env.h (__do_std_codecvt): Do not take address of N.B. include/bits/locale_conv.h not locale_env.h