[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
hans added a comment. In https://reviews.llvm.org/D50534#1225591, @xbolva00 wrote: > is this fixed in 7.0 release branch too? > > @hans I've merged it in r341529 now. Thanks for letting me know. Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
xbolva00 added subscribers: hans, xbolva00. xbolva00 added a comment. is this fixed in 7.0 release branch too? @hans Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
This revision was automatically updated to reflect the committed changes. Closed by commit rCXX340609: [libc++] Fix handling of negated character classes in regex (authored by ldionne, committed by ). Changed prior to commit: https://reviews.llvm.org/D50534?vs=160169&id=162377#toc Repository: rCXX libc++ https://reviews.llvm.org/D50534 Files: include/regex test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp Index: include/regex === --- include/regex +++ include/regex @@ -2414,20 +2414,17 @@ goto __exit; } } -// set of "__found" chars = +// When there's at least one of __neg_chars_ and __neg_mask_, the set +// of "__found" chars is // union(complement(union(__neg_chars_, __neg_mask_)), // other cases...) // -// __neg_chars_ and __neg_mask_'d better be handled together, as there -// are no short circuit opportunities. -// -// In addition, when __neg_mask_/__neg_chars_ is empty, they should be -// treated as all ones/all chars. +// It doesn't make sense to check this when there are no __neg_chars_ +// and no __neg_mask_. +if (!(__neg_mask_ == 0 && __neg_chars_.empty())) { - const bool __in_neg_mask = (__neg_mask_ == 0) || - __traits_.isctype(__ch, __neg_mask_); +const bool __in_neg_mask = __traits_.isctype(__ch, __neg_mask_); const bool __in_neg_chars = - __neg_chars_.empty() || std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) != __neg_chars_.end(); if (!(__in_neg_mask || __in_neg_chars)) Index: test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp === --- test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp +++ test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp @@ -18,7 +18,7 @@ #include #include -#include "test_macros.h" + // PR34310 int main() Index: test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp === --- test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp +++ test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp @@ -0,0 +1,44 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// + +// +// UNSUPPORTED: c++98, c++03 + +// Make sure that we correctly match inverted character classes. + +#include +#include + + +int main() { +assert(std::regex_match("X", std::regex("[X]"))); +assert(std::regex_match("X", std::regex("[XY]"))); +assert(!std::regex_match("X", std::regex("[^X]"))); +assert(!std::regex_match("X", std::regex("[^XY]"))); + +assert(std::regex_match("X", std::regex("[\\S]"))); +assert(!std::regex_match("X", std::regex("[^\\S]"))); + +assert(!std::regex_match("X", std::regex("[\\s]"))); +assert(std::regex_match("X", std::regex("[^\\s]"))); + +assert(std::regex_match("X", std::regex("[\\s\\S]"))); +assert(std::regex_match("X", std::regex("[^Y\\s]"))); +assert(!std::regex_match("X", std::regex("[^X\\s]"))); + +assert(std::regex_match("X", std::regex("[\\w]"))); +assert(std::regex_match("_", std::regex("[\\w]"))); +assert(!std::regex_match("X", std::regex("[^\\w]"))); +assert(!std::regex_match("_", std::regex("[^\\w]"))); + +assert(!std::regex_match("X", std::regex("[\\W]"))); +assert(!std::regex_match("_", std::regex("[\\W]"))); +assert(std::regex_match("X", std::regex("[^\\W]"))); +assert(std::regex_match("_", std::regex("[^\\W]"))); +} Index: include/regex === --- include/regex +++ include/regex @@ -2414,20 +2414,17 @@ goto __exit; } } -// set of "__found" chars = +// When there's at least one of __neg_chars_ and __neg_mask_, the set +// of "__found" chars is // union(complement(union(__neg_chars_, __neg_mask_)), // other cases...) // -// __neg_chars_ and __neg_mask_'d better be handled together, as there -// are no short circuit opportunities. -// -// In addition, when __neg_mask_/__neg_chars_ is empty, they should be -// treated as all ones/all chars. +// It doesn't make sense to check this when there are no __neg_chars_ +// and no __neg_mask_.
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
timshen accepted this revision. timshen added a comment. This revision is now accepted and ready to land. That looks more correct to me, thanks! Although I'm still puzzled by the empty check at all, it's clearly an improvement. Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
ldionne added a comment. In https://reviews.llvm.org/D50534#1194664, @timshen wrote: > I'm not fully equipped with the context right now, but something doesn't add > up. if `__neg_chars_.empty()` check is removed, the `(__neg_mask_ == 0)` > above should be removed too. They have to be consistent. > > However, there is more weirdness in it. The comment above describes the > intention: > > union(complement(union(__neg_chars_, __neg_mask_)), other cases...) > > > With the `__neg_chars_.empty()` and `(__neg_mask_ == 0)` removed, I believe > that the code exactly matches the comment. Let's see what happens when users > don't specify any negative class or chars. `__neg_chars_` and `__neg_mask_` > will be empty sets, and `union(complement(union(__neg_chars_, __neg_mask_)), > other cases...)` always evaluate to full set, which means it always matches > all characters. This can't be right. > > It's likely that the comment description doesn't fully describe the intended > behavior. I think we need to figure that out first. Yes, I think you were right. I think what was missing is that the comment does not apply when there's no `neg_mask` and no `neg_chars`. Otherwise we get into that situation where we're taking the complement of an empty set, and we match everything, which is not what we want. Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
ldionne updated this revision to Diff 160169. ldionne added a comment. Rewrite the patch in light of timshen's comment, and add tests. Repository: rCXX libc++ https://reviews.llvm.org/D50534 Files: libcxx/include/regex libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp Index: libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp === --- libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp +++ libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp @@ -18,7 +18,7 @@ #include #include -#include "test_macros.h" + // PR34310 int main() Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp === --- /dev/null +++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp @@ -0,0 +1,44 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// + +// +// UNSUPPORTED: c++98, c++03 + +// Make sure that we correctly match inverted character classes. + +#include +#include + + +int main() { +assert(std::regex_match("X", std::regex("[X]"))); +assert(std::regex_match("X", std::regex("[XY]"))); +assert(!std::regex_match("X", std::regex("[^X]"))); +assert(!std::regex_match("X", std::regex("[^XY]"))); + +assert(std::regex_match("X", std::regex("[\\S]"))); +assert(!std::regex_match("X", std::regex("[^\\S]"))); + +assert(!std::regex_match("X", std::regex("[\\s]"))); +assert(std::regex_match("X", std::regex("[^\\s]"))); + +assert(std::regex_match("X", std::regex("[\\s\\S]"))); +assert(std::regex_match("X", std::regex("[^Y\\s]"))); +assert(!std::regex_match("X", std::regex("[^X\\s]"))); + +assert(std::regex_match("X", std::regex("[\\w]"))); +assert(std::regex_match("_", std::regex("[\\w]"))); +assert(!std::regex_match("X", std::regex("[^\\w]"))); +assert(!std::regex_match("_", std::regex("[^\\w]"))); + +assert(!std::regex_match("X", std::regex("[\\W]"))); +assert(!std::regex_match("_", std::regex("[\\W]"))); +assert(std::regex_match("X", std::regex("[^\\W]"))); +assert(std::regex_match("_", std::regex("[^\\W]"))); +} Index: libcxx/include/regex === --- libcxx/include/regex +++ libcxx/include/regex @@ -2414,20 +2414,17 @@ goto __exit; } } -// set of "__found" chars = +// When there's at least one of __neg_chars_ and __neg_mask_, the set +// of "__found" chars is // union(complement(union(__neg_chars_, __neg_mask_)), // other cases...) // -// __neg_chars_ and __neg_mask_'d better be handled together, as there -// are no short circuit opportunities. -// -// In addition, when __neg_mask_/__neg_chars_ is empty, they should be -// treated as all ones/all chars. +// It doesn't make sense to check this when there are no __neg_chars_ +// and no __neg_mask_. +if (!(__neg_mask_ == 0 && __neg_chars_.empty())) { - const bool __in_neg_mask = (__neg_mask_ == 0) || - __traits_.isctype(__ch, __neg_mask_); +const bool __in_neg_mask = __traits_.isctype(__ch, __neg_mask_); const bool __in_neg_chars = - __neg_chars_.empty() || std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) != __neg_chars_.end(); if (!(__in_neg_mask || __in_neg_chars)) Index: libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp === --- libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp +++ libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp @@ -18,7 +18,7 @@ #include #include -#include "test_macros.h" + // PR34310 int main() Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp === --- /dev/null +++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp @@ -0,0 +1,44 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
timshen added a comment. I'm not fully equipped with the context right now, but something doesn't add up. if `__neg_chars_.empty()` check is removed, the `(__neg_mask_ == 0)` above should be removed too. They have to be consistent. However, there is more weirdness in it. The comment above describes the intention: union(complement(union(__neg_chars_, __neg_mask_)), other cases...) With the `__neg_chars_.empty()` and `(__neg_mask_ == 0)` removed, I believe that the code exactly matches the comment. Let's see what happens when users don't specify any negative class or chars. __neg_chars_ and __neg_mask_ will be empty sets, and `union(complement(union(__neg_chars_, __neg_mask_)), other cases...)` always evaluate to true, which means it always matches all characters. This can't be right. It's likely that the comment description doesn't fully describe the intended behavior. I think we need to figure that out first. Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
ldionne added a comment. The changeset that introduced this regression was reviewed as https://reviews.llvm.org/D37955. @timshen, I'm curious to understand why you or'ed with `__neg_chars_.empty()` when initializing `__in_neg_chars`. The logic's not clear to me, and my tests seem to show that this is wrong, but I don't understand the reasoning behind doing it in the first place, so I might be introducing another bug here. Please double-check. Repository: rCXX libc++ https://reviews.llvm.org/D50534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50534: [libc++] Fix handling of negated character classes in regex
ldionne created this revision. ldionne added reviewers: mclow.lists, timshen. Herald added a reviewer: EricWF. Herald added subscribers: cfe-commits, dexonsmith, christof. This commit fixes a regression introduced in r316095, where we don't match inverted character classes when there's no negated characrers in the []'s. rdar://problem/43060054 Repository: rCXX libc++ https://reviews.llvm.org/D50534 Files: libcxx/include/regex libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp === --- /dev/null +++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp @@ -0,0 +1,29 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// + +// +// UNSUPPORTED: c++98, c++03 + +// Make sure that we correctly match inverted character classes. + +#include +#include + + +int main() { +assert(std::regex_match("X", std::regex("[\\S]"))); +assert(!std::regex_match("X", std::regex("[^\\S]"))); + +assert(!std::regex_match("X", std::regex("[\\s]"))); +assert(std::regex_match("X", std::regex("[^\\s]"))); + +assert(std::regex_match("X", std::regex("[\\s\\S]"))); +assert(std::regex_match("X", std::regex("[^Y\\s]"))); +assert(!std::regex_match("X", std::regex("[^X\\s]"))); +} Index: libcxx/include/regex === --- libcxx/include/regex +++ libcxx/include/regex @@ -2427,7 +2427,6 @@ const bool __in_neg_mask = (__neg_mask_ == 0) || __traits_.isctype(__ch, __neg_mask_); const bool __in_neg_chars = - __neg_chars_.empty() || std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) != __neg_chars_.end(); if (!(__in_neg_mask || __in_neg_chars)) Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp === --- /dev/null +++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp @@ -0,0 +1,29 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// + +// +// UNSUPPORTED: c++98, c++03 + +// Make sure that we correctly match inverted character classes. + +#include +#include + + +int main() { +assert(std::regex_match("X", std::regex("[\\S]"))); +assert(!std::regex_match("X", std::regex("[^\\S]"))); + +assert(!std::regex_match("X", std::regex("[\\s]"))); +assert(std::regex_match("X", std::regex("[^\\s]"))); + +assert(std::regex_match("X", std::regex("[\\s\\S]"))); +assert(std::regex_match("X", std::regex("[^Y\\s]"))); +assert(!std::regex_match("X", std::regex("[^X\\s]"))); +} Index: libcxx/include/regex === --- libcxx/include/regex +++ libcxx/include/regex @@ -2427,7 +2427,6 @@ const bool __in_neg_mask = (__neg_mask_ == 0) || __traits_.isctype(__ch, __neg_mask_); const bool __in_neg_chars = - __neg_chars_.empty() || std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) != __neg_chars_.end(); if (!(__in_neg_mask || __in_neg_chars)) ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits