[Bug sanitizer/113430] [11/12/13 only] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430 --- Comment #11 from Dimitrij Mijoski --- (In reply to Sam James from comment #10) > I don't plan on pursuing it myself, leaving it to someone else, as I can't > reproduce on my main workstation and I don't want to faff w/ kernel config. You should be able to modify the kernel parameter at runtime by running: sudo sysctl vm.mmap_rnd_bits=32 That should be enough to reproduce the issue. The fix is to cherry-pick the changes to asan_allocator.h but also to lsan_allocator.h from this patch r14-263-gd53b3d94aaf211ffb2159614f5aaaf03ceb861cc. You missed lsan_allocator.h in your patch.
[Bug sanitizer/113430] Trivial program segfaults intermittently with ASAN with large CONFIG_ARCH_MMAP_RND_BITS in kernel configuration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113430 Dimitrij Mijoski changed: What|Removed |Added CC||dmjpp at hotmail dot com --- Comment #8 from Dimitrij Mijoski --- This bug manifested at large on Github Actions CI/CI system in the last few days most likely because Ubuntu's kernel also got updated to use 32 random bits. Here is the bug report https://github.com/actions/runner-images/issues/9491 . It would be a good idea to backport the fix.
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #11 from Dimitrij Mijoski --- (In reply to Jonathan Wakely from comment #10) > I think it would be good to backport it, what do you think? I don't really have strong need. Maybe porting only to v13 as that is pretty straightforward (simple cherry-picking). Porting to v12 and v11 will require porting other patches first.
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #9 from Dimitrij Mijoski --- I believe this bug report should closed as resolved. Are there maybe plans for back-porting?
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #7 from Dimitrij Mijoski --- I put a second version of the patch https://gcc.gnu.org/pipermail/libstdc++/2023-March/055667.html about a month ago.
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #6 from Dimitrij Mijoski --- I sent a single patch to the mailing list with a good detailed commit message. I think that is better than multiple patches.
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #4 from Dimitrij Mijoski --- I have fixed this, added large testsute and discovered another bug in codecvt_utf16 when the input [from, from_end) contains odd number of bytes. Error was returned instead of partial. Here are the changes in 8 commits https://github.com/gcc-mirror/gcc/compare/master...dimztimz:gcc:codecvt (read from top to bottom). Do you want everything squashed in one commit or maybe some other combination?
[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 --- Comment #2 from Dimitrij Mijoski --- (In reply to Dimitrij Mijoski from comment #0) > Those that read from UCS-2 seem to me like they properly report the error. > Reading from UTF-16 can not have this bug by definition. From what I > checked, the functions for reading UTF-16 properly treat unpaired surrogate > code units as error. Seems like the conversion from UCS-2 to UTF-16BE/LE is also affected. This conversions is called via codecvt_utf16::out(). See line https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B11/codecvt.cc;h=02f05752de84139a7eb7c3d40946b61f4c0334cf;hb=HEAD#l656 it only checks for high surrogate but should also check for low.
[Bug libstdc++/108976] New: codecvt for Unicode allows surrogate code points
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976 Bug ID: 108976 Summary: codecvt for Unicode allows surrogate code points Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dmjpp at hotmail dot com Target Milestone: --- Text in valid Unicode should never contain surrogate code POINTS. Those are only allowed in UTF-16, but only as code UNITS and must be properly paired. UTF-8 text in its strictest form must not contain surrogates but in a slightly relaxed form surrogates can be easily encoded as 3-byte sequences. Same can be said for UTF-32 and UCS-2. Only UTF-16 is immune to the error of surrogate code POINT (they are treated as UNITS). Codecvts in libstdc++ currently allow surrogate code points in some cases. Here is a minimal reproduction (asserts are the correct behavior): #include #include void u32() { using namespace std; auto& f = use_facet>(locale::classic()); char u8str[] = "\uC800\uCBFF\uCC00\uCFFF"; u8str[0] = u8str[3] = u8str[6] = u8str[9] = 0xED; // turn the C into D. // now the string is D800, DBFF, DC00 and DFFF encoded in relaxed UTF-8 // that allows surrogate code points. char32_t u32str[] = {0xD800, 0xDBFF, 0xDC00, 0xDFFF, 0}; char32_t u32out[1]; const char* from_next; char32_t* to_next; mbstate_t st = {}; auto res = f.in(st, u8str, u8str+3, from_next, u32out, u32out+1, to_next); assert(res == f.error); assert(from_next == u8str); assert(to_next == u32out); st = {}; auto l = f.length(st, u8str, u8str+3, 1); assert(l == 0); char u8out[3]; const char32_t* from_next2; char* to_next2; st = {}; res = f.out(st, u32str, u32str+1, from_next2, u8out, u8out+3, to_next2); assert(res == f.error); assert(from_next2 == u32str); assert(to_next2 == u8out); } void u16() { using namespace std; auto& f = use_facet>(locale::classic()); char u8str[] = "\uC800\uCBFF\uCC00\uCFFF"; u8str[0] = u8str[3] = u8str[6] = u8str[9] = 0xED; // turn the C into D. // now the string is D800, DBFF, DC00 and DFFF encoded in relaxed UTF-8 // that allows surrogates. char16_t u16out[1]; const char* from_next; char16_t* to_next; mbstate_t st = {}; auto res = f.in(st, u8str, u8str+3, from_next, u16out, u16out+1, to_next); assert(res == f.error); assert(from_next == u8str); assert(to_next == u16out); st = {}; auto l = f.length(st, u8str, u8str+3, 1); assert(l == 0); } int main() { u32(); u16(); } >From reading the file codecvt.cc the following conversions have the bug: - From UTF-8 to any other encoding. - From UTF-32/UCS-4 to any other encoding. Those that read from UCS-2 seem to me like they properly report the error. Reading from UTF-16 can not have this bug by definition. From what I checked, the functions for reading UTF-16 properly treat unpaired surrogate code units as error.
[Bug c++/106656] [C++23] P2513 - char8_t Compatibility and Portability Fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106656 Dimitrij Mijoski changed: What|Removed |Added CC||dmjpp at hotmail dot com --- Comment #3 from Dimitrij Mijoski --- The documentation for CLI flag -fchar8_t should be updated https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html . Is there any chance that this fix gets backported because it is treated as defect report to C++20?
[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 --- Comment #3 from Dimitrij Mijoski --- (In reply to Jonathan Wakely from comment #2) > This was already fixed on master by r11-6682 > 05a30af3f237984b4dcf1dbbc17fdac583c46506 Yes, that patch mostly fixes bug 70303, too. With that patch, the asserts presented in bug 70303 pass for vector::iterator but not for deque::iterator.
[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 Dimitrij Mijoski changed: What|Removed |Added CC||dmjpp at hotmail dot com --- Comment #1 from Dimitrij Mijoski --- This bug looks like a duplicate of bug 70303. The asserts presented there should be used on random-access iterators (vector, deque) to test if N3644 is implement.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #12 from Dimitrij Mijoski --- Hello Jonathan. I posted a patch for this bug which I hope you'll find it useful once you start working on this. https://gcc.gnu.org/pipermail/libstdc++/2020-September/051073.html
[Bug other/97076] clang-format file does not work for some C++11 code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97076 Dimitrij Mijoski changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Dimitrij Mijoski --- Fixed in https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=172178c0c35e1dabb778c80c26dc872136c45cf5
[Bug other/97076] New: clang-format file does not work for some C++11 code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97076 Bug ID: 97076 Summary: clang-format file does not work for some C++11 code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: dmjpp at hotmail dot com Target Milestone: --- This is not a bug in GCC, but in the supporting files in contrib. The clang-format file at the end has Standard: Cpp03. See https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=contrib/clang-format;h=7a4e96f64ca64a332ca5f945f08425c3a3e045c6;hb=HEAD#l150 I was writing a libstdc++ patch where I used the C++11 feature Unicode string literals. auto x = u"abc"; The clang format adds a space between the u and the quotation marks, making the file invalid C++. auto x = u "abc"; The fix is to change the standard to auto: Standard: Auto
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #10 from Dimitrij Mijoski --- I was wrong in comment #9. The bug and the proposed fix are ok in comment #7. While writing some tests for error I discovered yet another bug in UTF-8 decoding. See the example: // 2 code points, both are 4 byte in UTF-8. const char u8in[] = u8"\U0010\U0010"; const char32_t u32in[] = U"\U0010\U0010"; void utf8_to_utf32_in_error_7 (const codecvt ) { char in[7] = {}; char32_t out[3] = {}; char_traits::copy (in, u8in, 7); in[5] = 'z'; // Last CP has two errors. Its second code unit is malformed and it // misses its last code unit. Because it misses its last CU, the // decoder return too early that it is incomplete. // It should return invalid. auto state = mbstate_t{}; auto in_next = (const char *) nullptr; auto out_next = (char32_t *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + 7, in_next, out, out + 3, out_next); VERIFY (res == cvt.error); //incorrectly returns partial VERIFY (in_next == in + 4); VERIFY (out_next == out + 1); VERIFY (out[0] == u32in[0] && out[1] == 0 && out[2] == 0); } I published the full testsuite on Github, licensed under GPL v3+ of course. https://github.com/dimztimz/codecvt_test/blob/master/codecvt.cpp . I was thinking of sending a patch, but after this last bug, 4th, I see this needs more time. Maybe a testsuite from another library like ICU can be incorporated? Well, whatever, I will pause my work on this.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #9 from Dimitrij Mijoski --- Ignore my last comment, here is it fixed. Looking again at my proposed fix in comment #7, i concluded it is not the best fix. It will fix the testsuite in the same comment #7, but I discovered another class of errors related to the lines I am touching in that proposed fix. The error is when we have an incomplete sequence which is in the middle of the from range, and not at the end. In such cases codecvt_base::error should be returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4. I guess some more test need to be written about returning error.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #8 from Dimitrij Mijoski --- Looking again at my proposed fix in comment #6, i concluded it is not the best fix. It will fix the testsuite in the same comment #6, but I discovered another class of errors related to the lines I am touching in that proposed fix. The error is when we have an incomplete sequence which is in the middle of the from range, and not at the end. In such cases codecvt_base::error should be returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4. I guess some more test need to be written about returning error.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #7 from Dimitrij Mijoski --- I think a found a related bug in the UTF8 to UCS2 codecvt, codecvt_utf8. It can be tested with the following example: #include auto test_u8_ucs2_in() { // 2 code points, one is 3 bytes and the other is 4 bytes in UTF-8. // in UTF-16 the first is sinlge unit, the second is surrogate pair // in UCS2 only the first CP is allowed. const char* in = u8"\u\U0010"; char16_t out[2] = { 'y' , 'y' }; auto cvt_ptr = make_unique>(); auto& cvt = *cvt_ptr; auto state = mbstate_t{}; auto in_ptr = in; auto out_ptr = out; state = {}; in_ptr = nullptr; out_ptr = nullptr; auto res = cvt.in(state, in, in + 2, in_ptr, out, out, out_ptr); assert(res == cvt.partial); //BUG, returns OK, should be Partial assert(out_ptr == out); assert(in_ptr == in); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 2, in_ptr, out, out + 1, out_ptr); assert(res == cvt.partial); // BUG, returns ERROR, should be Partial assert(out_ptr == out); assert(in_ptr == in); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 3, in_ptr, out, out, out_ptr); assert(res == cvt.partial); //BUG, return OK, should be Partial assert(out_ptr == out); assert(in_ptr == in); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 3, in_ptr, out, out + 1, out_ptr); assert(res == cvt.ok); assert(out_ptr == out + 1); assert(in_ptr == in + 3); cout << "UCS2 sequence: " << hex << out[0] << ' ' << out[1] << '\n'; state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 6, in_ptr, out, out + 1, out_ptr); assert(res == cvt.partial); // BUG, return OK, should be Partial assert(out_ptr == out + 1); assert(in_ptr == in + 3); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 6, in_ptr, out, out + 2, out_ptr); assert(res == cvt.partial); // BUG, returns ERROR, should be Partial assert(out_ptr == out + 1); assert(in_ptr == in + 3); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 7, in_ptr, out, out + 1, out_ptr); assert(res == cvt.partial); // BUG, returns OK, should be Partial assert(out_ptr == out + 1); assert(in_ptr == in + 3); state = {}; in_ptr = nullptr; out_ptr = nullptr; res = cvt.in(state, in, in + 7, in_ptr, out, out + 2, out_ptr); assert(res == cvt.error); assert(out_ptr == out + 1); assert(in_ptr == in + 3); } The bug lies in the same function utf16_in() I mentioned in comment #5, in lines 544-547 https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B11/codecvt.cc;h=0311b15177d0439757e0347f7934b5a09b78f8e3;hb=HEAD#l544 Those lines: 544 if (s == surrogates::allowed) 545 return codecvt_base::partial; 546 else 547 return codecvt_base::error; // No surrogates in UCS2 Should simply be one line: 544 return codecvt_base::partial;
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #5 from Dimitrij Mijoski --- I think I found where the bug lies. It lies in 1. line 557 of the file c++11/codecvt.cc https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B11/codecvt.cc;h=0311b15177d0439757e0347f7934b5a09b78f8e3;hb=HEAD#l557 . The return of the function utf16_in() should be: return from.size() ? codecvt_base::partial : codecvt_base::ok; The bug is triggered because the loop exists because t.size() is zero. from.size() should be checked. 2. line 579 of the same file https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B11/codecvt.cc;h=0311b15177d0439757e0347f7934b5a09b78f8e3;hb=HEAD#l579 578 if (from.size() < 2) 579 return codecvt_base::ok; // stop converting at this point Should be 578 if (from.size() < 2) 579 return codecvt_base::partial; // stop converting at this point
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 Dimitrij Mijoski changed: What|Removed |Added Summary|codecvt::in() and out()|...>::in() and out() |incorrectly return partial |incorrectly return ok in |in some cases. |some cases. --- Comment #4 from Dimitrij Mijoski --- I think i found where the bug lies. It lies in 1. line 557 of the file c++11/codecvt.cc https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B11/codecvt.cc;h=0311b15177d0439757e0347f7934b5a09b78f8e3;hb=HEAD#l557 . The return of the function utf16_in() should be: return from.size() ? codecvt_base::partial : codecvt_base::ok; The bug is triggered because the loop exists because t.size() is zero. from.size() should be checked. 2.
[Bug libstdc++/85494] implementation of random_device on mingw is useless
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494 --- Comment #6 from Dimitrij Mijoski --- I read the patch couple of times and seems completely OK. Can you push it to the repository (or to a fork)?
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return partial in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 Dimitrij Mijoski changed: What|Removed |Added Attachment #44359|0 |1 is obsolete|| --- Comment #2 from Dimitrij Mijoski --- Created attachment 44360 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44360=edit better test cases with proper asserts In the previous file the asserts were accustomed to the bugged behavior, had only comments. in this file the asserts are made as the expected behavior.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return partial in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #1 from Dimitrij Mijoski --- Created attachment 44359 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44359=edit test cases that trigger the bug
[Bug libstdc++/86419] New: codecvt::in() and out() incorrectly return partial in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 Bug ID: 86419 Summary: codecvt::in() and out() incorrectly return partial in some cases. Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dmjpp at hotmail dot com Target Milestone: --- I have created a bunch of test cases, and on some it fails unexpectedly. I'll post the code as attachment, the lines with the bug have the word "bug".
[Bug libstdc++/85494] implementation of random_device on mingw is useless
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494 --- Comment #3 from Dimitrij Mijoski --- Created attachment 44358 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44358=edit implements proper random_device for mingw-w64
[Bug libstdc++/85494] implementation of random_device on mingw is useless
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494 Dimitrij Mijoski changed: What|Removed |Added CC||dmjpp at hotmail dot com --- Comment #2 from Dimitrij Mijoski --- I have created a patch that fixes this for mingw-w64 using rand_s() https://msdn.microsoft.com/en-us/library/sxtz2fa8.aspx It does not work with mingw.org (reverts to mt19973 as now). mingw.org does not have rand_s() declared in its headers.