[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-06-09 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment. Sorry this got stalled, it's important (I hadn't realized we were crashing on *indexing* when boost was included). I've created D81530 derived from this which addresses the previous round of comments and adds a testcase for indexing.

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-21 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment. > Proposing a fix to extend the ascii handling code to take extended ascii as > well. Please reduce the scope of this patch to not crashing on invalid utf-8. Handling other encodings is in principle possible, but requires more work and a more precise understanding of

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-21 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added a comment. also could you clang-format your changes CHANGES SINCE LAST ACTION https://reviews.llvm.org/D74731/new/ https://reviews.llvm.org/D74731 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-b

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-21 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:70 size_t UTF8Length = llvm::countLeadingOnes(C); -// 0xxx is ASCII, handled above. 10xxx is a trailing byte, invalid here. -// 1xxx is not valid UTF-8 at all. Assert because it's

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng updated this revision to Diff 245036. AnakinZheng added a comment. Implement @sammccall 's suggestion and add test. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D74731/new/ https://reviews.llvm.org/D74731 Files: clang-tools-extra/clangd/SourceCode.cpp clang-tools-extra/

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng marked an inline comment as done. AnakinZheng added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))) { // ASC

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))) { // ASCII or extended ASCII character. if (CB(1,

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng marked an inline comment as done. AnakinZheng added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))) { // ASC

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))) { // ASCII or extended ASCII character. if (CB(1,

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment. Specifically, we can't change the return value for any valid utf-8 string (see tests). Instead, where we currently assert length is between 2 and 4, you could change that to an if and treat it (arbitrarily) as a one-utf-8-byte, one-utf-16-code-unit character. This n

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng marked an inline comment as done. AnakinZheng added a comment. Update patch. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng updated this revision to Diff 245021. AnakinZheng added a comment. Fix the previous stupid mistake. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D74731/new/ https://reviews.llvm.org/D74731 Files: clang-tools-extra/clangd/SourceCode.cpp Index: clang-tools-extra/clangd/So

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment. Yeah I think there must be some confusion about what this code is doing. It's specifically iterating over the unicode codepoints of what are supposed to be UTF-8-encoded input bytes. The input turns out sometimes not to be UTF-8 (e.g. the file on disk is ISO-8859-1 a

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Kadir Cetinkaya via Phabricator via cfe-commits
kadircet added inline comments. Comment at: clang-tools-extra/clangd/SourceCode.cpp:62 unsigned char C = static_cast(U8[I]); -if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character. +if (LLVM_LIKELY(!(C & 0x100))) { // ASCII or extended ASCII character. if (CB(1,

[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

2020-02-17 Thread Yancheng Zheng via Phabricator via cfe-commits
AnakinZheng created this revision. AnakinZheng added reviewers: kadircet, sammccall. AnakinZheng added a project: clang. Herald added subscribers: cfe-commits, usaxena95, arphaman, jkorous, MaskRay, ilya-biryukov. Previously this piece of code asserts when processing extended ASCII. Proposing a