[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-12-20 Thread Bruno Ricci via Phabricator via cfe-commits
riccibruno added a comment.

Is anyone working on this ? This is the only persistent msan failure when doing 
`check-clang` on my machine.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D42043/new/

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-08-20 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added subscribers: jkorous, vsapsai.
vsk added a comment.

+ Jan and Volodymyr. This seemed to be in good shape the last time I looked at 
it. Not having touched libclang for a while I don't think I can give an 
official lgtm.


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-08-18 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande added a comment.

hi @vsk + @arphaman , it's been a while -- I brought this old diff up to date, 
wondering if you could take a look again.  Thanks!


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-08-17 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande updated this revision to Diff 161370.
elsteveogrande added a comment.

Rebase + fix conflicts for very old diff.  Works again.

`ninja check-clang` with MSAN-enabled build:

Before:

  Failing Tests (2):
  Clang :: CodeGen/signed_metadata.cpp
  Clang :: Index/comment-to-html-xml-conversion.cpp
  
Expected Passes: 12777
Expected Failures  : 19
Unsupported Tests  : 291
Unexpected Failures: 2
  FAILED: tools/clang/test/CMakeFiles/check-clang
  cd /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test && 
/bin/python2.7 /data/users/steveo/llvm-ct/build/msan/clang/./bin/llvm-lit -sv 
--param 
clang_site_config=/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test/lit.site.cfg
 --param USE_Z3_SOLVER=0 
/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test
  ninja: build stopped: subcommand failed.

After:

  Failing Tests (1):
  Clang :: CodeGen/signed_metadata.cpp
  
Expected Passes: 12778
Expected Failures  : 19
Unsupported Tests  : 291
Unexpected Failures: 1
  FAILED: tools/clang/test/CMakeFiles/check-clang
  cd /data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test && 
/bin/python2.7 /data/users/steveo/llvm-ct/build/msan/clang/./bin/llvm-lit -sv 
--param 
clang_site_config=/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test/lit.site.cfg
 --param USE_Z3_SOLVER=0 
/data/users/steveo/llvm-ct/build/msan/clang/tools/clang/test
  ninja: build stopped: subcommand failed.


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp

Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,130 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
 using namespace clang;
 
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
-
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
-
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+static_assert(sizeof(CXString) <= 16, "");
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be deleted.
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
-CXString createEmpty() {
+CXString createRef(const char *String) {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  if (String) {
+Str.Size = strlen(String);
+Str.IsNullTerminated = true;
+  } else {
+Str.Size = 0;
+Str.IsNullTerminated = false;
+  }
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
-CXString createNull() {
-  CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+CXString createEmpty() {
+  return createRef("");
 }
 
-CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
-return createEmpty();
+CXString createNull() {
+  return createRef(nullptr);
+}
 
-  CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = (char *) malloc(Size + 1);
+  assert(Spelling);
+  if (CS) {
+memcpy(Spelling, CS, Size);
+  }
+  Spelling[Size] = 0;
+  return Spelling;
 }
 
-CXString createDup(const char *String) {
-  if (!String)
+ CXString createDup(const char *String) {
+  if (!String) {
 return createNull();
-
-  if (String[0] == '\0')
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = strdup(String);
-  Str.private_flags = 

[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-06-13 Thread Vitaly Buka via Phabricator via cfe-commits
vitalybuka added a comment.

Is this stale?


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-23 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande added a comment.

Hi @vsk + @arphaman : I did find such problems with either MSAN (with 
poison-in-dtor) and ASAN, so we should be able to discover these instances 
pretty easily :)


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-22 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added a reviewer: arphaman.
vsk added a comment.

In https://reviews.llvm.org/D42043#981409, @elsteveogrande wrote:

> Fixes, but first, a question for reviewers:
>
> Looking at the description of `clang_disposeString`:
>
>   /**
>* \brief Free the given string.
>*/
>   CINDEX_LINKAGE void clang_disposeString(CXString string);
>   
>
> Does it seem incorrect to acquire some `const char *` pointer into this 
> string, dispose the string, and then access the characters?


I'm inclined to say that this is incorrect, but skimming through the codebase 
it seems like we "get away" with this behavior all over the place.

> I've seen this happen a couple of times now.  As I make changes to my code I 
> run into this pattern.  (Since now this triggers a use-after-free abort.)
> 
> I wanted to ask because, though it seemed obvious to me that this is 
> incorrect usage, I'm now wondering if the expectation is that it's ok.

I've CC'd some Apple folks who can say more definitively how difficult it'd be 
to move away from this behavior. IMO it should be doable as long as there's 
some way to audit the codebase (e.g with ASan).

>   Or maybe wasn't technically ok, and we just "got away with it" before.  :)
>
> 
> Anyway, assuming it's only correct to use the string before disposing it, 
> then the fixes this time around are:
> 
> - Fix an ASAN use-after-free in `c-index-test` mentioned above.  Get the 
> `CXString`, pass it around, then dispose when we're done with it.
> - Change `createEmpty` and `createNull` to delegate through `createRef`
> - `createRef` tolerates `nullptr` correctly.
> - I previously ran into ASAN aborts due to mixing malloc/free/operator 
> new/delete.  I had changed everything to use operator new/delete.  However 
> from C-land I can't `new char[length]`, only `malloc` (or `strdup` or 
> whatever).  Change everything to malloc/free instead.


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande updated this revision to Diff 130557.
elsteveogrande added a comment.

remove unneeded changes


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp

Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,130 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
-using namespace clang;
-
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
-
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
+static_assert(sizeof(CXString) <= 16, "");
 
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+using namespace clang;
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be deleted.
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
-CXString createEmpty() {
+CXString createRef(const char *String) {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  if (String) {
+Str.Size = strlen(String);
+Str.IsNullTerminated = true;
+  } else {
+Str.Size = 0;
+Str.IsNullTerminated = false;
+  }
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
-CXString createNull() {
-  CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+CXString createEmpty() {
+  return createRef("");
 }
 
-CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
-return createEmpty();
+CXString createNull() {
+  return createRef(nullptr);
+}
 
-  CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = (char *) malloc(Size + 1);
+  assert(Spelling);
+  if (CS) {
+memcpy(Spelling, CS, Size);
+  }
+  Spelling[Size] = 0;
+  return Spelling;
 }
 
 CXString createDup(const char *String) {
-  if (!String)
+  if (!String) {
 return createNull();
-
-  if (String[0] == '\0')
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = strdup(String);
-  Str.private_flags = CXS_Malloc;
+  Str.Size = strlen(String);
+  Str.Contents = (const void *) copyCharRange(String, Str.Size);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = true;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(StringRef String) {
-  // If the string is not nul-terminated, we have to make a copy.
-
-  // FIXME: This is doing a one past end read, and should be removed! For memory
-  // we don't manage, the API string can become unterminated at any time outside
-  // our control.
-
-  if (!String.empty() && String.data()[String.size()] != 0)
-return createDup(String);
-
-  CXString Result;
-  Result.data = String.data();
-  Result.private_flags = (unsigned) CXS_Unmanaged;
-  return Result;
+  assert (String.size() <= std::numeric_limits::max());
+  CXString Str;
+  Str.Size = unsigned(String.size());
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
+  auto *RC = new RefCountedCharRange {
+/* Data */ String.data(),
+/* CString */ nullptr,
+/* Size */ Str.Size,
+/* Count */ 1,
+  };
+  Str.Contents = (const void *) RC;
+  return Str;
 }
 
 CXString createDup(StringRef String) {
-  CXString Result;
-  char *Spelling = static_cast(malloc(String.size() + 1));
-  memmove(Spelling, String.data(), String.size());
-  Spelling[String.size()] = 0;
-  Result.data = Spelling;
-  Result.private_flags = (unsigned) 

[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande updated this revision to Diff 130556.
elsteveogrande marked an inline comment as done.
elsteveogrande added a comment.

Fixes, but first, a question for reviewers:

Looking at the description of `clang_disposeString`:

  /**
   * \brief Free the given string.
   */
  CINDEX_LINKAGE void clang_disposeString(CXString string);

Does it seem incorrect to acquire some `const char *` pointer into this string, 
dispose the string, and then access the characters?

I've seen this happen a couple of times now.  As I make changes to my code I 
run into this pattern.  (Since now this triggers a use-after-free abort.)

I wanted to ask because, though it seemed obvious to me that this is incorrect 
usage, I'm now wondering if the expectation is that it's ok.  Or maybe wasn't 
technically ok, and we just "got away with it" before.  :)

Anyway, assuming it's only correct to use the string before disposing it, then 
the fixes this time around are:

- Fix an ASAN use-after-free in `c-index-test` mentioned above.  Get the 
`CXString`, pass it around, then dispose when we're done with it.
- Change `createEmpty` and `createNull` to delegate through `createRef`
- `createRef` tolerates `nullptr` correctly.
- I previously ran into ASAN aborts due to mixing malloc/free/operator 
new/delete.  I had changed everything to use operator new/delete.  However from 
C-land I can't `new char[length]`, only `malloc` (or `strdup` or whatever).  
Change everything to malloc/free instead.


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  .watchmanconfig
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp
  tools/libclang/CXString.h

Index: tools/libclang/CXString.h
===
--- tools/libclang/CXString.h
+++ tools/libclang/CXString.h
@@ -106,4 +106,3 @@
 }
 
 #endif
-
Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,130 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
-using namespace clang;
-
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
-
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
+static_assert(sizeof(CXString) <= 16, "");
 
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+using namespace clang;
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be deleted.
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
-CXString createEmpty() {
+CXString createRef(const char *String) {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  if (String) {
+Str.Size = strlen(String);
+Str.IsNullTerminated = true;
+  } else {
+Str.Size = 0;
+Str.IsNullTerminated = false;
+  }
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
-CXString createNull() {
-  CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+CXString createEmpty() {
+  return createRef("");
 }
 
-CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
-return createEmpty();
+CXString createNull() {
+  return createRef(nullptr);
+}
 
-  CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
-  return Str;
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = (char *) malloc(Size + 1);
+  assert(Spelling);
+  if (CS) {
+memcpy(Spelling, CS, Size);
+  }
+  Spelling[Size] = 0;
+  return Spelling;
 }
 
 CXString createDup(const char *String) {
- 

[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added a comment.

Mind rebasing this when you have a chance?


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande marked 2 inline comments as done.
elsteveogrande added inline comments.



Comment at: tools/c-index-test/c-index-test.c:3268
 
-  filename = clang_getFileName(file);
-  index_data->main_filename = clang_getCString(filename);
-  clang_disposeString(filename);
+  index_data->main_filename = clang_getFileName(file);
 

elsteveogrande wrote:
> vsk wrote:
> > This looks like a separate bug fix. Is it possible to separate the 
> > main_filename changes from this patch?
> Will do!
Done: https://reviews.llvm.org/D42259


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added a comment.

Thanks for working on this :).




Comment at: tools/libclang/CXString.cpp:213
+  if (string.IsNullTerminated) {
+CString = (const char *) string.Contents;
+  } else {

elsteveogrande wrote:
> vsk wrote:
> > Basic question: If a non-owning CXString is null-terminated, what provides 
> > the guarantee that the string is in fact valid when getCString() is called? 
> > Is the user of the C API responsible for ensuring the lifetime of the 
> > string is valid?
> I believe the API itself is the one building `CXString` instances, and the 
> user of the C API doesn't really create them, only use them.  So the API has 
> to ensure the string stays "good" while there may be references to it.
> 
> (Which feels a little fragile.  But I think that's the tradeoff being made.  
> You'll get either "fast" strings, or data guaranteed to be sane.  I'd opt for 
> safer data but I don't know who's using this C API and am afraid to introduce 
> a serious perf regression.  So it'll stay this way and I'll try my best to 
> solve *-SAN issues with these constraints :) )
Sgtm, it doesn't look like this is altering the API contract.


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-18 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande added a comment.

Thanks very much for looking at this @vsk !  I actually found an ASAN bug in my 
new code, mixing and matching `malloc/free` and `operator`s `new/delete`.




Comment at: tools/c-index-test/c-index-test.c:3268
 
-  filename = clang_getFileName(file);
-  index_data->main_filename = clang_getCString(filename);
-  clang_disposeString(filename);
+  index_data->main_filename = clang_getFileName(file);
 

vsk wrote:
> This looks like a separate bug fix. Is it possible to separate the 
> main_filename changes from this patch?
Will do!



Comment at: tools/libclang/CXString.cpp:59
 CXString createEmpty() {
   CXString Str;
+  Str.Contents = (const void *) "";

vsk wrote:
> Why shouldn't this be defined as createRef("")?
Hmm, good question.  `createRef` below actually calls back to `createEmpty` and 
`createNull` in the nonnull-but-empty and null cases.

I think I'll do this the other way around, and let `createRef` have the 
responsibility of dealing with these fields and nulls and whatnot.



Comment at: tools/libclang/CXString.cpp:213
+  if (string.IsNullTerminated) {
+CString = (const char *) string.Contents;
+  } else {

vsk wrote:
> Basic question: If a non-owning CXString is null-terminated, what provides 
> the guarantee that the string is in fact valid when getCString() is called? 
> Is the user of the C API responsible for ensuring the lifetime of the string 
> is valid?
I believe the API itself is the one building `CXString` instances, and the user 
of the C API doesn't really create them, only use them.  So the API has to 
ensure the string stays "good" while there may be references to it.

(Which feels a little fragile.  But I think that's the tradeoff being made.  
You'll get either "fast" strings, or data guaranteed to be sane.  I'd opt for 
safer data but I don't know who's using this C API and am afraid to introduce a 
serious perf regression.  So it'll stay this way and I'll try my best to solve 
*-SAN issues with these constraints :) )


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-17 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added reviewers: akyrtzi, benlangmuir.
vsk added inline comments.



Comment at: tools/c-index-test/c-index-test.c:3268
 
-  filename = clang_getFileName(file);
-  index_data->main_filename = clang_getCString(filename);
-  clang_disposeString(filename);
+  index_data->main_filename = clang_getFileName(file);
 

This looks like a separate bug fix. Is it possible to separate the 
main_filename changes from this patch?



Comment at: tools/libclang/CXString.cpp:59
 CXString createEmpty() {
   CXString Str;
+  Str.Contents = (const void *) "";

Why shouldn't this be defined as createRef("")?



Comment at: tools/libclang/CXString.cpp:213
+  if (string.IsNullTerminated) {
+CString = (const char *) string.Contents;
+  } else {

Basic question: If a non-owning CXString is null-terminated, what provides the 
guarantee that the string is in fact valid when getCString() is called? Is the 
user of the C API responsible for ensuring the lifetime of the string is valid?



Comment at: tools/libclang/CXString.h:108
 
 #endif

Generally unrelated whitespace changes should be left out of patches.


Repository:
  rC Clang

https://reviews.llvm.org/D42043



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-17 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande updated this revision to Diff 130225.
elsteveogrande added a comment.

Add a needed null-check on input string's data ptr.


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp
  tools/libclang/CXString.h

Index: tools/libclang/CXString.h
===
--- tools/libclang/CXString.h
+++ tools/libclang/CXString.h
@@ -106,4 +106,3 @@
 }
 
 #endif
-
Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,141 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
-using namespace clang;
-
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
+static_assert(sizeof(CXString) <= 16, "");
 
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
-
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+using namespace clang;
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be freed with \b free .
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
 CXString createEmpty() {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) "";
+  Str.Size = 0;
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createNull() {
   CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = nullptr;
+  Str.Size = 0;
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
+  if (!String) {
+return createNull();
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  Str.Size = strlen(String);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = static_cast(malloc(Size + 1));
+  memcpy(Spelling, CS, Size);
+  Spelling[Size] = 0;
+  return Spelling;
+}
+
 CXString createDup(const char *String) {
-  if (!String)
+  if (!String) {
 return createNull();
-
-  if (String[0] == '\0')
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = strdup(String);
-  Str.private_flags = CXS_Malloc;
+  Str.Size = strlen(String);
+  Str.Contents = (const void *) copyCharRange(String, Str.Size);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = true;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(StringRef String) {
-  // If the string is not nul-terminated, we have to make a copy.
-
-  // FIXME: This is doing a one past end read, and should be removed! For memory
-  // we don't manage, the API string can become unterminated at any time outside
-  // our control.
-
-  if (!String.empty() && String.data()[String.size()] != 0)
-return createDup(String);
-
-  CXString Result;
-  Result.data = String.data();
-  Result.private_flags = (unsigned) CXS_Unmanaged;
-  return Result;
+  assert (String.size() <= std::numeric_limits::max());
+  CXString Str;
+  Str.Size = unsigned(String.size());
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
+  auto *RC = new RefCountedCharRange {
+/* Data */ String.data(),
+/* CString */ nullptr,
+/* Size */ Str.Size,
+/* Count */ 1,
+  };
+  

[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-16 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande updated this revision to Diff 129959.
elsteveogrande added a comment.

Change string-copy-on-demand logic; do only if `not IsNullTerminated`.


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp
  tools/libclang/CXString.h

Index: tools/libclang/CXString.h
===
--- tools/libclang/CXString.h
+++ tools/libclang/CXString.h
@@ -106,4 +106,3 @@
 }
 
 #endif
-
Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,141 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
-using namespace clang;
-
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
+static_assert(sizeof(CXString) <= 16, "");
 
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
-
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+using namespace clang;
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be freed with \b free .
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
 CXString createEmpty() {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) "";
+  Str.Size = 0;
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createNull() {
   CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = nullptr;
+  Str.Size = 0;
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
+  if (!String) {
+return createNull();
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  Str.Size = strlen(String);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = static_cast(malloc(Size + 1));
+  memcpy(Spelling, CS, Size);
+  Spelling[Size] = 0;
+  return Spelling;
+}
+
 CXString createDup(const char *String) {
-  if (!String)
+  if (!String) {
 return createNull();
-
-  if (String[0] == '\0')
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = strdup(String);
-  Str.private_flags = CXS_Malloc;
+  Str.Size = strlen(String);
+  Str.Contents = (const void *) copyCharRange(String, Str.Size);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = true;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(StringRef String) {
-  // If the string is not nul-terminated, we have to make a copy.
-
-  // FIXME: This is doing a one past end read, and should be removed! For memory
-  // we don't manage, the API string can become unterminated at any time outside
-  // our control.
-
-  if (!String.empty() && String.data()[String.size()] != 0)
-return createDup(String);
-
-  CXString Result;
-  Result.data = String.data();
-  Result.private_flags = (unsigned) CXS_Unmanaged;
-  return Result;
+  assert (String.size() <= std::numeric_limits::max());
+  CXString Str;
+  Str.Size = unsigned(String.size());
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
+  auto *RC = new RefCountedCharRange {
+/* Data */ String.data(),
+/* CString */ nullptr,
+/* Size */ Str.Size,
+/* 

[PATCH] D42043: c-index: CXString: fix MSAN read-past-end bug

2018-01-14 Thread Steve O'Brien via Phabricator via cfe-commits
elsteveogrande created this revision.
Herald added a subscriber: cfe-commits.

Previous impl would read the byte past the end of a string (a 
`llvm::StringRef`), possibly exceeding the allocation for that memory and 
raising an MSAN issue.

This will instead save the character range specified by the `StringRef` and 
copy it on-demand to a new C string.

Passes existing tests, and now passes with MSAN as well.


Repository:
  rC Clang

https://reviews.llvm.org/D42043

Files:
  include/clang-c/CXString.h
  tools/c-index-test/c-index-test.c
  tools/libclang/CXString.cpp
  tools/libclang/CXString.h

Index: tools/libclang/CXString.h
===
--- tools/libclang/CXString.h
+++ tools/libclang/CXString.h
@@ -106,4 +106,3 @@
 }
 
 #endif
-
Index: tools/libclang/CXString.cpp
===
--- tools/libclang/CXString.cpp
+++ tools/libclang/CXString.cpp
@@ -15,99 +15,141 @@
 
 #include "CXString.h"
 #include "CXTranslationUnit.h"
+#include "clang-c/CXString.h"
 #include "clang-c/Index.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "llvm/Support/ErrorHandling.h"
+#include 
 
-using namespace clang;
-
-/// Describes the kind of underlying data in CXString.
-enum CXStringFlag {
-  /// CXString contains a 'const char *' that it doesn't own.
-  CXS_Unmanaged,
+static_assert(sizeof(CXString) <= 16, "");
 
-  /// CXString contains a 'const char *' that it allocated with malloc().
-  CXS_Malloc,
-
-  /// CXString contains a CXStringBuf that needs to be returned to the
-  /// CXStringPool.
-  CXS_StringBuf
-};
+using namespace clang;
 
 namespace clang {
 namespace cxstring {
 
+/**
+ * This is for \b CXString 's which are created with \b CreateRef(StringRef).
+ * We'll store the info from the input \b StringRef: char ptr and size.
+ *
+ * We don't know for sure whether this is null-terminated so, when and if
+ * \b clang_getCString is called for this \b CXString, we'll allocate C string
+ * storage and copy data into the storage.  We'll memo-ize that in the
+ * \b CString member.
+ *
+ * This is refcounted; the \b Count member is initially 1.  When a \b CXString
+ * instance using this object is disposed via \b clang_disposeString, \b Count
+ * is decremented.  When this string is duplicated the \b Count increases.
+ *
+ * When \b Count finally drops to zero, the ptr at \b CString, and this object,
+ * should be freed with \b free .
+ */
+struct RefCountedCharRange {
+  const char *Data;
+  const char *CString;
+  unsigned Size;
+  unsigned Count;
+};
+
 //===--===//
 // Basic generation of CXStrings.
 //===--===//
 
 CXString createEmpty() {
   CXString Str;
-  Str.data = "";
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) "";
+  Str.Size = 0;
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createNull() {
   CXString Str;
-  Str.data = nullptr;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = nullptr;
+  Str.Size = 0;
+  Str.IsNullTerminated = false;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(const char *String) {
-  if (String && String[0] == '\0')
+  if (!String) {
+return createNull();
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = String;
-  Str.private_flags = CXS_Unmanaged;
+  Str.Contents = (const void *) String;
+  Str.Size = strlen(String);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = false;
+  Str.IsPooled = false;
   return Str;
 }
 
+inline static const char *copyCharRange(const char *CS, unsigned Size) {
+  char *Spelling = static_cast(malloc(Size + 1));
+  memcpy(Spelling, CS, Size);
+  Spelling[Size] = 0;
+  return Spelling;
+}
+
 CXString createDup(const char *String) {
-  if (!String)
+  if (!String) {
 return createNull();
-
-  if (String[0] == '\0')
+  }
+  if (String[0] == '\0') {
 return createEmpty();
+  }
 
   CXString Str;
-  Str.data = strdup(String);
-  Str.private_flags = CXS_Malloc;
+  Str.Size = strlen(String);
+  Str.Contents = (const void *) copyCharRange(String, Str.Size);
+  Str.IsNullTerminated = true;
+  Str.IsOwned = true;
+  Str.IsPooled = false;
   return Str;
 }
 
 CXString createRef(StringRef String) {
-  // If the string is not nul-terminated, we have to make a copy.
-
-  // FIXME: This is doing a one past end read, and should be removed! For memory
-  // we don't manage, the API string can become unterminated at any time outside
-  // our control.
-
-  if (!String.empty() && String.data()[String.size()] != 0)
-return createDup(String);
-
-  CXString Result;
-  Result.data = String.data();
-  Result.private_flags = (unsigned) CXS_Unmanaged;
-  return Result;
+  assert (String.size() <= std::numeric_limits::max());
+  CXString Str;