[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
riccibruno added a comment. Is anyone working on this ? This is the only persistent msan failure when doing `check-clang` on my machine. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D42043/new/ https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vsk added subscribers: jkorous, vsapsai. vsk added a comment. + Jan and Volodymyr. This seemed to be in good shape the last time I looked at it. Not having touched libclang for a while I don't think I can give an official lgtm. Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande added a comment. hi @vsk + @arphaman , it's been a while -- I brought this old diff up to date, wondering if you could take a look again. Thanks! Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande updated this revision to Diff 161370. elsteveogrande added a comment. Rebase + fix conflicts for very old diff. Works again. `ninja check-clang` with MSAN-enabled build: Before: Failing Tests (2): Clang :: CodeGen/signed_metadata.cpp Clang :: Index/comment-to-html-xml-conversion.cpp Expected Passes: 12777 Expected Failures : 19 Unsupported Tests : 291 Unexpected Failures: 2 FAILED: tools/clang/test/CMakeFiles/check-clang cd /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test && /bin/python2.7 /data/users/steveo/llvm-ct/build/msan/clang/./bin/llvm-lit -sv --param clang_site_config=/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test/lit.site.cfg --param USE_Z3_SOLVER=0 /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test ninja: build stopped: subcommand failed. After: Failing Tests (1): Clang :: CodeGen/signed_metadata.cpp Expected Passes: 12778 Expected Failures : 19 Unsupported Tests : 291 Unexpected Failures: 1 FAILED: tools/clang/test/CMakeFiles/check-clang cd /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test && /bin/python2.7 /data/users/steveo/llvm-ct/build/msan/clang/./bin/llvm-lit -sv --param clang_site_config=/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test/lit.site.cfg --param USE_Z3_SOLVER=0 /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test ninja: build stopped: subcommand failed. Repository: rC Clang https://reviews.llvm.org/D42043 Files: include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,130 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include using namespace clang; -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, - - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, - - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +static_assert(sizeof(CXString) <= 16, ""); namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be deleted. + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// -CXString createEmpty() { +CXString createRef(const char *String) { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + if (String) { +Str.Size = strlen(String); +Str.IsNullTerminated = true; + } else { +Str.Size = 0; +Str.IsNullTerminated = false; + } + Str.IsOwned = false; + Str.IsPooled = false; return Str; } -CXString createNull() { - CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; - return Str; +CXString createEmpty() { + return createRef(""); } -CXString createRef(const char *String) { - if (String && String[0] == '\0') -return createEmpty(); +CXString createNull() { + return createRef(nullptr); +} - CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; - return Str; +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = (char *) malloc(Size + 1); + assert(Spelling); + if (CS) { +memcpy(Spelling, CS, Size); + } + Spelling[Size] = 0; + return Spelling; } -CXString createDup(const char *String) { - if (!String) + CXString createDup(const char *String) { + if (!String) { return createNull(); - - if (String[0] == '\0') + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = strdup(String); - Str.private_flags =
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vitalybuka added a comment. Is this stale? Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande added a comment. Hi @vsk + @arphaman : I did find such problems with either MSAN (with poison-in-dtor) and ASAN, so we should be able to discover these instances pretty easily :) Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vsk added a reviewer: arphaman. vsk added a comment. In https://reviews.llvm.org/D42043#981409, @elsteveogrande wrote: > Fixes, but first, a question for reviewers: > > Looking at the description of `clang_disposeString`: > > /** >* \brief Free the given string. >*/ > CINDEX_LINKAGE void clang_disposeString(CXString string); > > > Does it seem incorrect to acquire some `const char *` pointer into this > string, dispose the string, and then access the characters? I'm inclined to say that this is incorrect, but skimming through the codebase it seems like we "get away" with this behavior all over the place. > I've seen this happen a couple of times now. As I make changes to my code I > run into this pattern. (Since now this triggers a use-after-free abort.) > > I wanted to ask because, though it seemed obvious to me that this is > incorrect usage, I'm now wondering if the expectation is that it's ok. I've CC'd some Apple folks who can say more definitively how difficult it'd be to move away from this behavior. IMO it should be doable as long as there's some way to audit the codebase (e.g with ASan). > Or maybe wasn't technically ok, and we just "got away with it" before. :) > > > Anyway, assuming it's only correct to use the string before disposing it, > then the fixes this time around are: > > - Fix an ASAN use-after-free in `c-index-test` mentioned above. Get the > `CXString`, pass it around, then dispose when we're done with it. > - Change `createEmpty` and `createNull` to delegate through `createRef` > - `createRef` tolerates `nullptr` correctly. > - I previously ran into ASAN aborts due to mixing malloc/free/operator > new/delete. I had changed everything to use operator new/delete. However > from C-land I can't `new char[length]`, only `malloc` (or `strdup` or > whatever). Change everything to malloc/free instead. Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande updated this revision to Diff 130557. elsteveogrande added a comment. remove unneeded changes Repository: rC Clang https://reviews.llvm.org/D42043 Files: include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,130 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include -using namespace clang; - -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, - - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, +static_assert(sizeof(CXString) <= 16, ""); - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +using namespace clang; namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be deleted. + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// -CXString createEmpty() { +CXString createRef(const char *String) { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + if (String) { +Str.Size = strlen(String); +Str.IsNullTerminated = true; + } else { +Str.Size = 0; +Str.IsNullTerminated = false; + } + Str.IsOwned = false; + Str.IsPooled = false; return Str; } -CXString createNull() { - CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; - return Str; +CXString createEmpty() { + return createRef(""); } -CXString createRef(const char *String) { - if (String && String[0] == '\0') -return createEmpty(); +CXString createNull() { + return createRef(nullptr); +} - CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; - return Str; +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = (char *) malloc(Size + 1); + assert(Spelling); + if (CS) { +memcpy(Spelling, CS, Size); + } + Spelling[Size] = 0; + return Spelling; } CXString createDup(const char *String) { - if (!String) + if (!String) { return createNull(); - - if (String[0] == '\0') + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = strdup(String); - Str.private_flags = CXS_Malloc; + Str.Size = strlen(String); + Str.Contents = (const void *) copyCharRange(String, Str.Size); + Str.IsNullTerminated = true; + Str.IsOwned = true; + Str.IsPooled = false; return Str; } CXString createRef(StringRef String) { - // If the string is not nul-terminated, we have to make a copy. - - // FIXME: This is doing a one past end read, and should be removed! For memory - // we don't manage, the API string can become unterminated at any time outside - // our control. - - if (!String.empty() && String.data()[String.size()] != 0) -return createDup(String); - - CXString Result; - Result.data = String.data(); - Result.private_flags = (unsigned) CXS_Unmanaged; - return Result; + assert (String.size() <= std::numeric_limits::max()); + CXString Str; + Str.Size = unsigned(String.size()); + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; + auto *RC = new RefCountedCharRange { +/* Data */ String.data(), +/* CString */ nullptr, +/* Size */ Str.Size, +/* Count */ 1, + }; + Str.Contents = (const void *) RC; + return Str; } CXString createDup(StringRef String) { - CXString Result; - char *Spelling = static_cast(malloc(String.size() + 1)); - memmove(Spelling, String.data(), String.size()); - Spelling[String.size()] = 0; - Result.data = Spelling; - Result.private_flags = (unsigned)
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande updated this revision to Diff 130556. elsteveogrande marked an inline comment as done. elsteveogrande added a comment. Fixes, but first, a question for reviewers: Looking at the description of `clang_disposeString`: /** * \brief Free the given string. */ CINDEX_LINKAGE void clang_disposeString(CXString string); Does it seem incorrect to acquire some `const char *` pointer into this string, dispose the string, and then access the characters? I've seen this happen a couple of times now. As I make changes to my code I run into this pattern. (Since now this triggers a use-after-free abort.) I wanted to ask because, though it seemed obvious to me that this is incorrect usage, I'm now wondering if the expectation is that it's ok. Or maybe wasn't technically ok, and we just "got away with it" before. :) Anyway, assuming it's only correct to use the string before disposing it, then the fixes this time around are: - Fix an ASAN use-after-free in `c-index-test` mentioned above. Get the `CXString`, pass it around, then dispose when we're done with it. - Change `createEmpty` and `createNull` to delegate through `createRef` - `createRef` tolerates `nullptr` correctly. - I previously ran into ASAN aborts due to mixing malloc/free/operator new/delete. I had changed everything to use operator new/delete. However from C-land I can't `new char[length]`, only `malloc` (or `strdup` or whatever). Change everything to malloc/free instead. Repository: rC Clang https://reviews.llvm.org/D42043 Files: .watchmanconfig include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp tools/libclang/CXString.h Index: tools/libclang/CXString.h === --- tools/libclang/CXString.h +++ tools/libclang/CXString.h @@ -106,4 +106,3 @@ } #endif - Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,130 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include -using namespace clang; - -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, - - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, +static_assert(sizeof(CXString) <= 16, ""); - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +using namespace clang; namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be deleted. + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// -CXString createEmpty() { +CXString createRef(const char *String) { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + if (String) { +Str.Size = strlen(String); +Str.IsNullTerminated = true; + } else { +Str.Size = 0; +Str.IsNullTerminated = false; + } + Str.IsOwned = false; + Str.IsPooled = false; return Str; } -CXString createNull() { - CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; - return Str; +CXString createEmpty() { + return createRef(""); } -CXString createRef(const char *String) { - if (String && String[0] == '\0') -return createEmpty(); +CXString createNull() { + return createRef(nullptr); +} - CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; - return Str; +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = (char *) malloc(Size + 1); + assert(Spelling); + if (CS) { +memcpy(Spelling, CS, Size); + } + Spelling[Size] = 0; + return Spelling; } CXString createDup(const char *String) { -
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vsk added a comment. Mind rebasing this when you have a chance? Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande marked 2 inline comments as done. elsteveogrande added inline comments. Comment at: tools/c-index-test/c-index-test.c:3268 - filename = clang_getFileName(file); - index_data->main_filename = clang_getCString(filename); - clang_disposeString(filename); + index_data->main_filename = clang_getFileName(file); elsteveogrande wrote: > vsk wrote: > > This looks like a separate bug fix. Is it possible to separate the > > main_filename changes from this patch? > Will do! Done: https://reviews.llvm.org/D42259 Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vsk added a comment. Thanks for working on this :). Comment at: tools/libclang/CXString.cpp:213 + if (string.IsNullTerminated) { +CString = (const char *) string.Contents; + } else { elsteveogrande wrote: > vsk wrote: > > Basic question: If a non-owning CXString is null-terminated, what provides > > the guarantee that the string is in fact valid when getCString() is called? > > Is the user of the C API responsible for ensuring the lifetime of the > > string is valid? > I believe the API itself is the one building `CXString` instances, and the > user of the C API doesn't really create them, only use them. So the API has > to ensure the string stays "good" while there may be references to it. > > (Which feels a little fragile. But I think that's the tradeoff being made. > You'll get either "fast" strings, or data guaranteed to be sane. I'd opt for > safer data but I don't know who's using this C API and am afraid to introduce > a serious perf regression. So it'll stay this way and I'll try my best to > solve *-SAN issues with these constraints :) ) Sgtm, it doesn't look like this is altering the API contract. Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande added a comment. Thanks very much for looking at this @vsk ! I actually found an ASAN bug in my new code, mixing and matching `malloc/free` and `operator`s `new/delete`. Comment at: tools/c-index-test/c-index-test.c:3268 - filename = clang_getFileName(file); - index_data->main_filename = clang_getCString(filename); - clang_disposeString(filename); + index_data->main_filename = clang_getFileName(file); vsk wrote: > This looks like a separate bug fix. Is it possible to separate the > main_filename changes from this patch? Will do! Comment at: tools/libclang/CXString.cpp:59 CXString createEmpty() { CXString Str; + Str.Contents = (const void *) ""; vsk wrote: > Why shouldn't this be defined as createRef("")? Hmm, good question. `createRef` below actually calls back to `createEmpty` and `createNull` in the nonnull-but-empty and null cases. I think I'll do this the other way around, and let `createRef` have the responsibility of dealing with these fields and nulls and whatnot. Comment at: tools/libclang/CXString.cpp:213 + if (string.IsNullTerminated) { +CString = (const char *) string.Contents; + } else { vsk wrote: > Basic question: If a non-owning CXString is null-terminated, what provides > the guarantee that the string is in fact valid when getCString() is called? > Is the user of the C API responsible for ensuring the lifetime of the string > is valid? I believe the API itself is the one building `CXString` instances, and the user of the C API doesn't really create them, only use them. So the API has to ensure the string stays "good" while there may be references to it. (Which feels a little fragile. But I think that's the tradeoff being made. You'll get either "fast" strings, or data guaranteed to be sane. I'd opt for safer data but I don't know who's using this C API and am afraid to introduce a serious perf regression. So it'll stay this way and I'll try my best to solve *-SAN issues with these constraints :) ) Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
vsk added reviewers: akyrtzi, benlangmuir. vsk added inline comments. Comment at: tools/c-index-test/c-index-test.c:3268 - filename = clang_getFileName(file); - index_data->main_filename = clang_getCString(filename); - clang_disposeString(filename); + index_data->main_filename = clang_getFileName(file); This looks like a separate bug fix. Is it possible to separate the main_filename changes from this patch? Comment at: tools/libclang/CXString.cpp:59 CXString createEmpty() { CXString Str; + Str.Contents = (const void *) ""; Why shouldn't this be defined as createRef("")? Comment at: tools/libclang/CXString.cpp:213 + if (string.IsNullTerminated) { +CString = (const char *) string.Contents; + } else { Basic question: If a non-owning CXString is null-terminated, what provides the guarantee that the string is in fact valid when getCString() is called? Is the user of the C API responsible for ensuring the lifetime of the string is valid? Comment at: tools/libclang/CXString.h:108 #endif Generally unrelated whitespace changes should be left out of patches. Repository: rC Clang https://reviews.llvm.org/D42043 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande updated this revision to Diff 130225. elsteveogrande added a comment. Add a needed null-check on input string's data ptr. Repository: rC Clang https://reviews.llvm.org/D42043 Files: include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp tools/libclang/CXString.h Index: tools/libclang/CXString.h === --- tools/libclang/CXString.h +++ tools/libclang/CXString.h @@ -106,4 +106,3 @@ } #endif - Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,141 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include -using namespace clang; - -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, +static_assert(sizeof(CXString) <= 16, ""); - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, - - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +using namespace clang; namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be freed with \b free . + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// CXString createEmpty() { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) ""; + Str.Size = 0; + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createNull() { CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; + Str.Contents = nullptr; + Str.Size = 0; + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createRef(const char *String) { - if (String && String[0] == '\0') + if (!String) { +return createNull(); + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + Str.Size = strlen(String); + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = static_cast(malloc(Size + 1)); + memcpy(Spelling, CS, Size); + Spelling[Size] = 0; + return Spelling; +} + CXString createDup(const char *String) { - if (!String) + if (!String) { return createNull(); - - if (String[0] == '\0') + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = strdup(String); - Str.private_flags = CXS_Malloc; + Str.Size = strlen(String); + Str.Contents = (const void *) copyCharRange(String, Str.Size); + Str.IsNullTerminated = true; + Str.IsOwned = true; + Str.IsPooled = false; return Str; } CXString createRef(StringRef String) { - // If the string is not nul-terminated, we have to make a copy. - - // FIXME: This is doing a one past end read, and should be removed! For memory - // we don't manage, the API string can become unterminated at any time outside - // our control. - - if (!String.empty() && String.data()[String.size()] != 0) -return createDup(String); - - CXString Result; - Result.data = String.data(); - Result.private_flags = (unsigned) CXS_Unmanaged; - return Result; + assert (String.size() <= std::numeric_limits::max()); + CXString Str; + Str.Size = unsigned(String.size()); + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; + auto *RC = new RefCountedCharRange { +/* Data */ String.data(), +/* CString */ nullptr, +/* Size */ Str.Size, +/* Count */ 1, + }; +
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande updated this revision to Diff 129959. elsteveogrande added a comment. Change string-copy-on-demand logic; do only if `not IsNullTerminated`. Repository: rC Clang https://reviews.llvm.org/D42043 Files: include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp tools/libclang/CXString.h Index: tools/libclang/CXString.h === --- tools/libclang/CXString.h +++ tools/libclang/CXString.h @@ -106,4 +106,3 @@ } #endif - Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,141 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include -using namespace clang; - -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, +static_assert(sizeof(CXString) <= 16, ""); - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, - - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +using namespace clang; namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be freed with \b free . + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// CXString createEmpty() { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) ""; + Str.Size = 0; + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createNull() { CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; + Str.Contents = nullptr; + Str.Size = 0; + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createRef(const char *String) { - if (String && String[0] == '\0') + if (!String) { +return createNull(); + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + Str.Size = strlen(String); + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = static_cast(malloc(Size + 1)); + memcpy(Spelling, CS, Size); + Spelling[Size] = 0; + return Spelling; +} + CXString createDup(const char *String) { - if (!String) + if (!String) { return createNull(); - - if (String[0] == '\0') + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = strdup(String); - Str.private_flags = CXS_Malloc; + Str.Size = strlen(String); + Str.Contents = (const void *) copyCharRange(String, Str.Size); + Str.IsNullTerminated = true; + Str.IsOwned = true; + Str.IsPooled = false; return Str; } CXString createRef(StringRef String) { - // If the string is not nul-terminated, we have to make a copy. - - // FIXME: This is doing a one past end read, and should be removed! For memory - // we don't manage, the API string can become unterminated at any time outside - // our control. - - if (!String.empty() && String.data()[String.size()] != 0) -return createDup(String); - - CXString Result; - Result.data = String.data(); - Result.private_flags = (unsigned) CXS_Unmanaged; - return Result; + assert (String.size() <= std::numeric_limits::max()); + CXString Str; + Str.Size = unsigned(String.size()); + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; + auto *RC = new RefCountedCharRange { +/* Data */ String.data(), +/* CString */ nullptr, +/* Size */ Str.Size, +/*
[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug
elsteveogrande created this revision. Herald added a subscriber: cfe-commits. Previous impl would read the byte past the end of a string (a `llvm::StringRef`), possibly exceeding the allocation for that memory and raising an MSAN issue. This will instead save the character range specified by the `StringRef` and copy it on-demand to a new C string. Passes existing tests, and now passes with MSAN as well. Repository: rC Clang https://reviews.llvm.org/D42043 Files: include/clang-c/CXString.h tools/c-index-test/c-index-test.c tools/libclang/CXString.cpp tools/libclang/CXString.h Index: tools/libclang/CXString.h === --- tools/libclang/CXString.h +++ tools/libclang/CXString.h @@ -106,4 +106,3 @@ } #endif - Index: tools/libclang/CXString.cpp === --- tools/libclang/CXString.cpp +++ tools/libclang/CXString.cpp @@ -15,99 +15,141 @@ #include "CXString.h" #include "CXTranslationUnit.h" +#include "clang-c/CXString.h" #include "clang-c/Index.h" #include "clang/Frontend/ASTUnit.h" #include "llvm/Support/ErrorHandling.h" +#include -using namespace clang; - -/// Describes the kind of underlying data in CXString. -enum CXStringFlag { - /// CXString contains a 'const char *' that it doesn't own. - CXS_Unmanaged, +static_assert(sizeof(CXString) <= 16, ""); - /// CXString contains a 'const char *' that it allocated with malloc(). - CXS_Malloc, - - /// CXString contains a CXStringBuf that needs to be returned to the - /// CXStringPool. - CXS_StringBuf -}; +using namespace clang; namespace clang { namespace cxstring { +/** + * This is for \b CXString 's which are created with \b CreateRef(StringRef). + * We'll store the info from the input \b StringRef: char ptr and size. + * + * We don't know for sure whether this is null-terminated so, when and if + * \b clang_getCString is called for this \b CXString, we'll allocate C string + * storage and copy data into the storage. We'll memo-ize that in the + * \b CString member. + * + * This is refcounted; the \b Count member is initially 1. When a \b CXString + * instance using this object is disposed via \b clang_disposeString, \b Count + * is decremented. When this string is duplicated the \b Count increases. + * + * When \b Count finally drops to zero, the ptr at \b CString, and this object, + * should be freed with \b free . + */ +struct RefCountedCharRange { + const char *Data; + const char *CString; + unsigned Size; + unsigned Count; +}; + //===--===// // Basic generation of CXStrings. //===--===// CXString createEmpty() { CXString Str; - Str.data = ""; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) ""; + Str.Size = 0; + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createNull() { CXString Str; - Str.data = nullptr; - Str.private_flags = CXS_Unmanaged; + Str.Contents = nullptr; + Str.Size = 0; + Str.IsNullTerminated = false; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } CXString createRef(const char *String) { - if (String && String[0] == '\0') + if (!String) { +return createNull(); + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = String; - Str.private_flags = CXS_Unmanaged; + Str.Contents = (const void *) String; + Str.Size = strlen(String); + Str.IsNullTerminated = true; + Str.IsOwned = false; + Str.IsPooled = false; return Str; } +inline static const char *copyCharRange(const char *CS, unsigned Size) { + char *Spelling = static_cast(malloc(Size + 1)); + memcpy(Spelling, CS, Size); + Spelling[Size] = 0; + return Spelling; +} + CXString createDup(const char *String) { - if (!String) + if (!String) { return createNull(); - - if (String[0] == '\0') + } + if (String[0] == '\0') { return createEmpty(); + } CXString Str; - Str.data = strdup(String); - Str.private_flags = CXS_Malloc; + Str.Size = strlen(String); + Str.Contents = (const void *) copyCharRange(String, Str.Size); + Str.IsNullTerminated = true; + Str.IsOwned = true; + Str.IsPooled = false; return Str; } CXString createRef(StringRef String) { - // If the string is not nul-terminated, we have to make a copy. - - // FIXME: This is doing a one past end read, and should be removed! For memory - // we don't manage, the API string can become unterminated at any time outside - // our control. - - if (!String.empty() && String.data()[String.size()] != 0) -return createDup(String); - - CXString Result; - Result.data = String.data(); - Result.private_flags = (unsigned) CXS_Unmanaged; - return Result; + assert (String.size() <= std::numeric_limits::max()); + CXString Str;