[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

2018-05-18 Thread timshen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #6 from Tim Shen  ---
(In reply to Tim Shen from comment #5)
> First of all, std::regex("[0-9]") shouldn't be locale sensitive, as
> regex_constants::collate is set.

Correction: as regex_constants::collate is *not* set.

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

2018-05-18 Thread timshen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #5 from Tim Shen  ---
(In reply to Jonathan Wakely from comment #4)
> Tim, I'll take care of checking errno in collate<>::_M_transform but could
> you advise what to do about the regex compiler? Maybe:
> 
> --- a/libstdc++-v3/include/bits/regex.h
> +++ b/libstdc++-v3/include/bits/regex.h
> @@ -257,7 +257,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>   const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
>   std::vector __s(__first, __last);
>   __fctyp.tolower(__s.data(), __s.data() + __s.size());
> - return this->transform(__s.data(), __s.data() + __s.size());
> + __try {
> +   return this->transform(__s.data(), __s.data() + __s.size());
> + } catch(const std::runtime_error&) {
> +   return string_type();
> + }
> }
>  
>/**

First of all, std::regex("[0-9]") shouldn't be locale sensitive, as
regex_constants::collate is set. If somehow a locale-related exception was
thrown without collate being set, it's a bug in the regex implementation and we
should fix it. We probably have a bug in _BracketMatcher::_M_apply().

When collate is set, we still don't want to eagerly forward exceptions in regex
ctor. I think regex_traits<>::transform_primary should be exception neutral
(unless it's specified otherwise). Instead, we some how fix regex's constructor
not to generate exceptions from _BracketMatcher::_M_make_cache().

Regarding the compile-time variable __collate in _BracketMatcher, I suggest to
fix _BracketMatcher to the following definition:
* If !__collate or -fno-exceptions, nothing needs to be changed; otherwise
* change the element of cache from bool to a 3-state enum, e.g. enum { MATCHED,
NOT_MATCHED, NOT_CACHED }. When an exception happens in _M_make_cache, catch it
and set the cache result to NOT_CACHED. During regex matching, non-cached
result requires a full run of _M_apply() and it likely throws.

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

2018-05-18 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

Jonathan Wakely  changed:

   What|Removed |Added

 CC||timshen at gcc dot gnu.org

--- Comment #4 from Jonathan Wakely  ---
(In reply to Wanying Luo from comment #0)
> When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under
> the UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type
> size_t which creates a very large number. This causes __ret.append(__c,
> __res) to crash. I think it would be nice if the code checks errno and
> issues a better error message than the one above.

N.B. it doesn't just crash, it throws an exception because it can't append
4294967295 bytes to a std::string. Any fix to check errno in
collate::do_transform is still going to involve throwing an exception,
just a slightly different one.

The real problem is that std::regex wants to build a cache of every value from
CHAR_MIN to CHAR_MAX, to decide if it matches the bracket expression "[0-9]".
If calling strxfrm on any 8-bit char value produces an error then we're going
to get an exception. I think something in the regex compiler (maybe in
transform_primary) needs to handle those exceptions (and either decide the
characters that produce errors do not match, or maybe disable the cache?)

Tim, I'll take care of checking errno in collate<>::_M_transform but could you
advise what to do about the regex compiler? Maybe:

--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -257,7 +257,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
  std::vector __s(__first, __last);
  __fctyp.tolower(__s.data(), __s.data() + __s.size());
- return this->transform(__s.data(), __s.data() + __s.size());
+ __try {
+   return this->transform(__s.data(), __s.data() + __s.size());
+ } catch(const std::runtime_error&) {
+   return string_type();
+ }
}

   /**

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

2018-05-17 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-17
 Ever confirmed|0   |1

--- Comment #3 from Jonathan Wakely  ---
(In reply to Wanying Luo from comment #0)
> gcc version 4.9.2 (GCC) 

The earliest currently supported release is GCC 6.4, but this doesn't appear to
have been fixed already.

> In libstdc++-v3/include/bits/locale_classes.tcc, do_transform() is defined
> as follows:
> 
> do_transform(const _CharT* __lo, const _CharT* __hi) const
> {
> ...
>   size_t __res = _M_transform(__c, __p, __len);
> ...
>   __ret.append(__c, __res);
> 
> 
> When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under
> the UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type
> size_t which creates a very large number. This causes __ret.append(__c,
> __res) to crash.

Well the value returned is already a size_t, so it's already a very large
number (not -1), and we do check for larger values than expected, but we don't
check for errors.

> I think it would be nice if the code checks errno and
> issues a better error message than the one above.

Yes, we need to check errno for errors from strxfrm.

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class

2018-05-17 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #2 from Wanying Luo  ---
(In reply to Wanying Luo from comment #0)
> When I ran this test on a Linux machine with GCC 4.9.2, glibc's strxfrm()
> converts 0x80 to 6 bytes.

Pasting my test on Linux with the same version of GCC for completeness.


$ cat test.cpp
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
std::regex("[0-9]");
}

$ echo $LANG
en_US.UTF-8

$ g++ -std=c++11 test.cpp

$ ./a.out 

$ cat more_test.cpp 
#include 
#include 
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
char a[] = { 0x80, '\0' };
printf("%lu\n", strxfrm(NULL, a, 0));
printf("%s\n", strerror(errno));
}

$ g++ -std=c++11 -w more_test.cpp 

$ ./a.out 
6
Success

$ uname -a
Linux d-ubuntu12x64-11 3.2.0-126-generic #169-Ubuntu SMP Fri Mar 31 14:15:21
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/wluo/othello/linux64-packages/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.2/configure --prefix=/usr
--with-local-prefix=/usr/local --enable-languages=c,c++,fortran --disable-nls
--disable-libcilkrts --disable-lto --enable-libstdcxx-time
--enable-clocale=generic
--with-stage1-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64
-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib'
--with-boot-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64
-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib'
--disable-werror --disable-multiarch --disable-bootstrap
Thread model: posix
gcc version 4.9.2 (GCC)

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class

2018-05-17 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #1 from Wanying Luo  ---
Here's GDB backtrace at the time of crash.


#0  0xf56fe7a0 in __lwp_sigqueue () from /lib/libc.so.1
#1  0xf56a1e90 in raise () from /lib/libc.so.1
#2  0xf567a274 in abort () from /lib/libc.so.1
#3  0xff2f2d70 in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#4  0xff2ef844 in __cxxabiv1::__terminate (handler=0xff2f2bac
<__gnu_cxx::__verbose_terminate_handler()>)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#5  0xff2ef8e8 in std::terminate () at
../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
#6  0xff2efc68 in __cxxabiv1::__cxa_rethrow () at
../../../../libstdc++-v3/libsupc++/eh_throw.cc:125
#7  0xff29c974 in std::collate::do_transform (this=0xff34d9f8 <(anonymous
namespace)::collate_c>, 
__lo=0x4fb3c "\200", __hi=0x4fb3d "")
at
/tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.tcc:245
#8  0xff29c25c in std::collate::transform (this=0xff34d9f8 <(anonymous
namespace)::collate_c>, 
__lo=0x4fb3c "\200", __hi=0x4fb3d "")
at
/tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.h:662
#9  0x0002ead4 in std::string std::regex_traits::transform(char*,
char*) const ()
#10 0x0002c634 in std::string
std::regex_traits::transform_primary(char*, char*) const ()
#11 0x000275f8 in std::__detail::_BracketMatcher::_M_apply(char, std::integral_constant) const ()
#12 0x00022bb4 in std::__detail::_BracketMatcher::_M_make_cache(std::integral_constant) ()
#13 0x0001ed70 in std::__detail::_BracketMatcher::_M_ready() ()
#14 0x0001f958 in void std::__detail::_Compiler::_M_insert_bracket_matcher(bool) ()
#15 0x0001c630 in std::__detail::_Compiler::_M_bracket_expression() ()
#16 0x000192e8 in std::__detail::_Compiler::_M_atom()
()
#17 0x00017910 in std::__detail::_Compiler::_M_term()
()
#18 0x00015868 in std::__detail::_Compiler::_M_alternative() ()
#19 0x000141dc in std::__detail::_Compiler::_M_disjunction() ()
#20 0x0001381c in std::__detail::_Compiler::_Compiler(char const*, char const*, std::regex_traits const&,
std::regex_constants::syntax_option_type) ()
#21 0x00013340 in std::shared_ptr
> std::__detail::__compile_nfa(std::regex_traits::char_type const*, std::regex_traits::char_type
const*, std::regex_traits const&,
std::regex_constants::syntax_option_type) ()
#22 0x0001307c in std::basic_regex::basic_regex(char const*, char const*,
std::regex_constants::syntax_option_type) ()
#23 0x00012d84 in std::basic_regex::basic_regex(char const*, std::regex_constants::syntax_option_type) ()
#24 0x000120d0 in main ()