[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Jonathan Wakely changed: What|Removed |Added Last reconfirmed|2014-06-25 00:00:00 |2021-12-16 Assignee|timshen at gcc dot gnu.org |redi at gcc dot gnu.org --- Comment #24 from Jonathan Wakely --- (In reply to Maksymilian Arciemowicz from comment #12) > Ups. Check this (.*{100}{300}) This one still results in a stack overflow on trunk, with an 8MB stack. That is: std::regex_match("a", std::regex("(.*{100}{300})")); I have a proof-of-concept patch replacing the recursion in _Executor. The example above runs successfully with a 16k stack limit.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #23 from Jonathan Wakely --- (In reply to M Welinder from comment #22) > FWIW, there is an excellent overview of regular expression engine pitfalls > and methods here: > > https://swtch.com/~rsc/regexp/regexp1.html > https://swtch.com/~rsc/regexp/regexp2.html > https://swtch.com/~rsc/regexp/regexp3.html Yes, there have been links to the first one in libstdc++ headers since 2013.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 M Welinder changed: What|Removed |Added CC||terra at gnome dot org --- Comment #22 from M Welinder --- FWIW, there is an excellent overview of regular expression engine pitfalls and methods here: https://swtch.com/~rsc/regexp/regexp1.html https://swtch.com/~rsc/regexp/regexp2.html https://swtch.com/~rsc/regexp/regexp3.html Those are about 10 years old, but not outdated. The TL;DR is "use NFAs and DFAs, not back-tracking".
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #21 from Tim Shen --- (In reply to Pádraig Brady from comment #20) > Any status update on this. GCC7 is looming... > Thanks. Unfortunately I haven't get a chance to work on this. I plan to put up a one-line tweak on the internal state limit to make the library throwing an exception, instead of crash. That's probably a strict improvement.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Pádraig Brady changed: What|Removed |Added CC||P at draigBrady dot com --- Comment #20 from Pádraig Brady --- Any status update on this. GCC7 is looming... Thanks.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Tim Shen changed: What|Removed |Added CC||chaoskeeper at mail dot ru --- Comment #19 from Tim Shen --- *** Bug 70459 has been marked as a duplicate of this bug. ***
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Tim Shen changed: What|Removed |Added CC||bisqwit at iki dot fi --- Comment #18 from Tim Shen --- *** Bug 70411 has been marked as a duplicate of this bug. ***
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Tim Shen changed: What|Removed |Added CC||kerukuro at gmail dot com --- Comment #17 from Tim Shen --- *** Bug 68688 has been marked as a duplicate of this bug. ***
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Tim Shen timshen at gcc dot gnu.org changed: What|Removed |Added CC||morandidodo at gmail dot com --- Comment #15 from Tim Shen timshen at gcc dot gnu.org --- *** Bug 66456 has been marked as a duplicate of this bug. ***
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Tim Shen timshen at gcc dot gnu.org changed: What|Removed |Added CC||antialize at gmail dot com --- Comment #16 from Tim Shen timshen at gcc dot gnu.org --- *** Bug 67212 has been marked as a duplicate of this bug. ***
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #13 from Maksymilian Arciemowicz max at cert dot cx --- @Tim: do you need help?
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #14 from Tim Shen timshen at gcc dot gnu.org --- (In reply to Maksymilian Arciemowicz from comment #13) @Tim: do you need help? This is what I'm going to do: https://gcc.gnu.org/ml/libstdc++/2014-07/msg8.html Please send to libstdc++ ml if you have any ideas.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #12 from Maksymilian Arciemowicz max at cert dot cx --- Ups. Check this (.*{100}{300}) gcc version 4.10.0 20140701 (experimental) (GCC) Starting program: /home/cx/REtrunk/kozak5/t3 '(.*{100}{300})' Program received signal SIGSEGV, Segmentation fault. 0x0040c22a in std::__detail::_Executorchar const*, std::allocatorstd::sub_matchchar const* , std::regex_traitschar, true::_M_dfs(std::__detail::_Executorchar const*, std::allocatorstd::sub_matchchar const* , std::regex_traitschar, true::_Match_mode, long) ()
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #11 from Tim Shen timshen at gcc dot gnu.org --- Author: timshen Date: Tue Jul 1 03:05:45 2014 New Revision: 212185 URL: https://gcc.gnu.org/viewcvs?rev=212185root=gccview=rev Log: PR libstdc++/61061 PR libstdc++/61582 * include/bits/regex_automaton.h (_NFA::_M_insert_state): Add a NFA state limit. If it's exceeded, regex_constants::error_space will be throwed. * include/bits/regex_automaton.tcc (_StateSeq::_M_clone): Use map (which is sparse) instead of vector. This reduce n times clones' cost from O(n^2) to O(n). * include/std/regex: Add map dependency. * testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New testcase. Added: trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/regex_automaton.h trunk/libstdc++-v3/include/bits/regex_automaton.tcc trunk/libstdc++-v3/include/std/regex
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #7 from Tim Shen timshen at gcc dot gnu.org --- (.*{100}{100}{100}) seems to be a stack overflow. It's because regex executor uses recursion. It could be fixed (not segfault but memory exhaustion) by using a std::stack and simulate recursion; IMH, however, directly throwing regex_error::error_space is the right thing here to do.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #8 from Maksymilian Arciemowicz max at cert dot cx --- (In reply to Tim Shen from comment #7) (.*{100}{100}{100}) seems to be a stack overflow. It's because regex executor uses recursion. It could be fixed (not segfault but memory exhaustion) by using a std::stack and simulate recursion; IMH, however, directly throwing regex_error::error_space is the right thing here to do. Yeap it's stack overflow. Why regex_error::error_space? Not better regex_error::error_stack?
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #9 from Tim Shen timshen at gcc dot gnu.org --- (In reply to Maksymilian Arciemowicz from comment #8) (In reply to Tim Shen from comment #7) (.*{100}{100}{100}) seems to be a stack overflow. It's because regex executor uses recursion. It could be fixed (not segfault but memory exhaustion) by using a std::stack and simulate recursion; IMH, however, directly throwing regex_error::error_space is the right thing here to do. Yeap it's stack overflow. Why regex_error::error_space? Not better regex_error::error_stack? Sorry for not clarify that: I prefer throwing error_space when constructing (complaining about too many states) instead of throwing error_stack when matching. To solve the latter problem, as I said, we can use a std::stack or something to avoid a stack overflow.
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #10 from Maksymilian Arciemowicz max at cert dot cx --- There is also one other alternative like this http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/regex/regcomp.c.diff?r1=1.29r2=1.30f=h
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|NEW Last reconfirmed||2014-06-25 Resolution|INVALID |--- Summary|C11 regex memory corruption |C++11 regex memory ||corruption Ever confirmed|0 |1 --- Comment #3 from Jonathan Wakely redi at gcc dot gnu.org --- (In reply to Maksymilian A from comment #2) cx@cx:~/REstd11/kozak5$ ./c11re '((x|' terminate called after throwing an instance of 'std::regex_error' what(): regex_error Przerwane (core dumped) I think this is by design. cx@cx:~/REstd11/kozak5$ ./c11re '((.*)()?*{100})' Naruszenie ochrony pamięci (core dumped) That's a bug. (It would be helpful if you didn't put C11 in the subject, this has nothing to do with C)
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added CC||timshen at gcc dot gnu.org --- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org --- That segfault is already fixed on trunk, although possibly just latent
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #5 from Maksymilian Arciemowicz max at cert dot cx --- Thanks for feedback. I'm going verify this on trunk
[Bug libstdc++/61582] C++11 regex memory corruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 --- Comment #6 from Maksymilian Arciemowicz max at cert dot cx --- @Jonathan: true but check this case cx@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/cx/gccTRUNK/bin/g++ COLLECT_LTO_WRAPPER=/home/cx/gccTRUNK/libexec/gcc/x86_64-unknown-linux-gnu/4.10.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../trunk/configure --prefix=/home/cx/gccTRUNK/ --disable-multilib Thread model: posix gcc version 4.10.0 20140625 (experimental) (GCC) cx@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ c11re.c -o c11re -std=c++11 cx@cx:~/REtrunk/kozak5$ ./c11re '(.*{100}{100}{100})' Naruszenie ochrony pamięci (core dumped) Program received signal SIGSEGV, Segmentation fault. 0x0041014e in std::__detail::_Executorchar const*, std::allocatorstd::sub_matchchar const* , std::regex_traitschar, true::_State_infostd::integral_constantbool, true, std::vectorstd::sub_matchchar const*, std::allocatorstd::sub_matchchar const* ::_M_visited(long) const () BR, Maksymilian http://cxsecurity.com/