[PATCH] D159519: [clang][AST][ASTImporter] improve AST comparasion on VarDecl & GotoStmt

2023-09-19 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM. I'm not very familiar with this area, but the change seems to be a very 
clean improvement.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159519/new/

https://reviews.llvm.org/D159519

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-09-18 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Looks good, thanks for the improvements!

One minor remark: your commit uses backticks (`) around the class names in the 
bug reports, which is commonly used to mark code fragments e.g. in commit 
messages; however, in the bug reports IIRC most other checkers use apostrophes 
(').


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-09-12 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/test/Analysis/ArrayDelete.cpp:85-88
+Derived *d3 = new DoubleDerived[10];
+Base *b3 = d3; // expected-note{{Conversion from derived to base happened 
here}}
+delete[] b3; // expected-warning{{Deleting an array of polymorphic objects 
is undefined}}
+// expected-note@-1{{Deleting an array of polymorphic objects is 
undefined}}

steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > Hmm, the static type of `d3` doesn't tell us much about how it refers to 
> > > an object of type `DoubleDerived`.
> > > To me, it would make sense to have multiple `Conversion from derived to 
> > > base happened here`, even telling us what static type it converted to 
> > > what other static type in the message.
> > > And it should add a new visitor of the same kind tracking the castee.
> > > 
> > > ```
> > > Derived *d3 = new DoubleDerived[10]; // note: `DoubleDerived` -> 
> > > `Derived` here
> > > Base *b3 = d3; // note: `Derived` -> `Base` here
> > > delete[] b3; // warn: Deleting `Derived` objects as `Base` objects.
> > > ```
> > > WDYT @donat.nagy ?
> > I agree that it would be good to be good to mention the class names in the 
> > message.
> Do you also agree that we should have all steps where such a conversion 
> happened?
> Notice the 2 `note:` markers in my example. @donat.nagy 
It would be a nice addition if it wouldn't seriously complicate the 
implementation.

If we want to report multiple/all conversions, then we would need to create 
messages for back-and-forth conversions (e.g. allocating Derived, converting it 
to Base, back to Derived, back to Base, then deleting it illegally).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-09-12 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp:47
+protected:
+  class PtrCastVisitor : public BugReporterVisitor {
   public:

I like that yo switched to a more descriptive class name :)



Comment at: clang/test/Analysis/ArrayDelete.cpp:85-88
+Derived *d3 = new DoubleDerived[10];
+Base *b3 = d3; // expected-note{{Conversion from derived to base happened 
here}}
+delete[] b3; // expected-warning{{Deleting an array of polymorphic objects 
is undefined}}
+// expected-note@-1{{Deleting an array of polymorphic objects is 
undefined}}

steakhal wrote:
> Hmm, the static type of `d3` doesn't tell us much about how it refers to an 
> object of type `DoubleDerived`.
> To me, it would make sense to have multiple `Conversion from derived to base 
> happened here`, even telling us what static type it converted to what other 
> static type in the message.
> And it should add a new visitor of the same kind tracking the castee.
> 
> ```
> Derived *d3 = new DoubleDerived[10]; // note: `DoubleDerived` -> `Derived` 
> here
> Base *b3 = d3; // note: `Derived` -> `Base` here
> delete[] b3; // warn: Deleting `Derived` objects as `Base` objects.
> ```
> WDYT @donat.nagy ?
I agree that it would be good to be good to mention the class names in the 
message.



Comment at: clang/test/Analysis/ArrayDelete.cpp:90-93
+Base *b4 = new DoubleDerived[10];
+Derived *d4 = reinterpret_cast(b4);
+DoubleDerived *dd4 = reinterpret_cast(d4);
+delete[] dd4; // no-warning

steakhal wrote:
> I think in such cases a `static_cast` should suffice; unless you 
> intentionally wanted to test `reinterpret_cast` of course.
Note that the effects of `reinterpret_cast` and `static_cast` are different 
[1]: `static_cast(BasePtr)` adjusts the pointer value to ensure that 
it points to the derived object whose Base ancestor object would be located at 
BasePtr, while a `reinterpret_cast` keeps the raw pointer value (which is 
complete nonsense from an object-oriented point-of-view, but could be relevant 
in some low-level hacks).

I'd guess that this part is directly testing the strange behavior of 
`reinterpret_cast`; but it'd be good to add a comment that clarifies this.

[1]: 
https://stackoverflow.com/questions/9138730/why-is-it-important-to-use-static-cast-instead-of-reinterpret-cast-here


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159479: [ASTImport]enhance statement comparing

2023-09-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

Today we learned that truth is different from falsehood...

LGTM, nice catch!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159479/new/

https://reviews.llvm.org/D159479

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158813: [analyzer] MPIChecker: MPI_Waitall should respect count arg

2023-09-06 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy requested changes to this revision.
donat.nagy added a comment.
This revision now requires changes to proceed.

I reviewed this change and collected my suggestions in comments.

In general, I feel that the code added by this commit is "sloppy" in the sense 
that it's designed for the common case and would behave unpredictably in 
unusual situations. This is a fault of the toolbox provided by the analyzer: in 
a saner world, you could freely combine the API functions (as you did) and 
expect that they would "just work" together; but in reality every function has 
its quirks and limitations, so the developer must understand the details of the 
implementation.




Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:189-190
+ASTCtx.getCharWidth();
+const NonLoc MROffset =
+SVB.makeArrayIndex(MR->getAsOffset().getOffset() / ElemSizeInBits);
 

danix800 wrote:
> steakhal wrote:
> > What implies that `MR->getAsOffset()` succeeds, and also what implies that 
> > the `Offset` within that is not symbolic?
> > Also, how can you use this without also using the result region?
> > Without using that you don't know what is the anchor point, from which this 
> > offset represent anything.
> > ATM I believe the code assumes that `MR->getRegion()` equals to 
> > `SuperRegion`, which might not be always the case.
> > This could materialize a problem when you construct the element region 
> > later.
> I'll restrict the checker to handle non-symbolic offset only.
> ATM I believe the code assumes that MR->getRegion() equals to SuperRegion, 
> which might not be always the case.
This remark of @steakhal highlights a problem which is still present in your 
code: when you calculate `SuperRegion` you strip a single `ElementRegion` 
layer; while the offset that you get via `getDynamicElementCountWithOffset()` 
is calculated from `MemRegion::getAsOffset()` which can strip multiple 
`ElementRegion`, `FieldRegion` etc. layers.

Unfortunately the memory region API of Clang Static Analyzer is very 
inconsistent and quirky; you must dig deep into the implementation to 
understand the exact meaning of these function calls. 

For example, if you have a data type like
```lang=c
struct ReqStruct {
int custom_info;
MPI_Request reqs[8];
} rs;
```
then the `SuperRegion` will refer to the immediate parent of the 
`ElementRegion` (which is presumably the `FieldRegion` corresponding to 
`reqs`); but the offset value is measured from the start of the `VarRegion` 
corresponding to the variable `rs` (and includes `sizeof(int)` + potential 
padding due to the presence of `custom_info`).

I think the correct solution would measure the offset from the start of the 
array of `MPI_Request` objects; because if you measure it from the start of the 
struct, then you may encounter serious issues, e.g. there is no guarantee that 
the offset will be divisible by the size of `MPI_Request`.

Unfortunately this would mean that you cannot rely on the logic of 
`getDynamicElementCountWithOffset()`;  and I don't see a clear way to 
generalize that library function to limit the number/type of region layers that 
it strips.

Here I don't see a clear way forward, as the two potential solutions are a deep 
rewrite that complicates the library code and a creating a locally tweaked 
duplicate of the library logic -- and these both have serious drawbacks.



Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:146
 
+static std::optional>
+getRequestRegionOffsetAndCount(const MemRegion *const MR, const CallEvent ) 
{

Use `long long` (or `int64_t`, `ssize_t`, whatever you prefer) instead of 
`NonLoc` here because it's easier to handle them (and you don't work with 
symbolic offsets anyway).



Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:174
+  return std::make_pair(
+  SVB.makeArrayIndex(RequestRegionOffset.getOffset() / TypeSizeInBits),
+  RequestRegionCount->getValue());

What happens here in the case when `TypeSizeInChars.isZero()` holds?

I see that you used a ternary operator to avoid division by zero; but do you 
get a meaningful and valid offset value in the case when `MPI_Request` has zero 
size and you use `TypeSizeInBits == 8`?

I understand that the divisior does not matter in the case when 
`RequestRegionOffset.getOffset()` is zero; but if I understand it correctly 
this offset value can be nonzero even in the case when `MPI_Request` has zero 
size. (Under the hood it calls `MemRegion::getAsOffset()` which may include the 
offset of e.g. a data member of an object.)



Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:213
+// StdCLibraryFunctions
+for (size_t i = 0; i < RegionCount && i < RequestedCount; ++i) {
+  auto RegionIndex =

Paranoid remark: if these count values are both very 

[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-09-05 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/docs/analyzer/checkers.rst:1793-1803
+.. code-block:: cpp
+
+ Base *create() {
+   Base *x = new Derived[10]; // note: conversion from derived to base
+  //   happened here
+   return x;
+ }

steakhal wrote:
> Discookie wrote:
> > steakhal wrote:
> > > Make sure in the example the `create` is related (e.g. called/used).
> > > Also, refrain from using `sink` in the docs. It's usually used in the 
> > > context of taint analysis.
> > Changed the example - should I change the DeleteWithNonVirtualDtor example 
> > as well? That has the same issues as you said here.
> I have no opinion. You could.
I think you should definitely update it, for the sake of consistency and just 
improving things whenever we can. This commit already touches the code of that 
checker, there is no point in leaving bad documentation behind us...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159412: [analyzer]FieldRegion in getStaticSize should return size of pointee type

2023-09-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Please add a testcase that demonstrates this issue (fails when your change in 
MemRegion.cpp isn't added) and shows that your commit fixes it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159412/new/

https://reviews.llvm.org/D159412

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159105: [analyzer] ArrayBoundCheckerV2 should check the region for taint as well

2023-09-01 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

As I thought more about this commit I realized that I have two vague but 
significant concerns. I'm sorry that I wasn't able to describe these before you 
started to dig into the details of the heuristics.

(1) I don't think that code like `int *ptr; scanf("%ld", );` should be 
covered by an ArrayBound checker: this is not just a "be careful with the value 
of the index" issue, but a raw pointer that's completely controlled by an 
untrusted source. I'd try to cover this kind of bug with either the StdLibrary 
checker or a small new checker [or perhaps just a clang-tidy check] which would 
report pointers coming from `scanf` or other user input even if it doesn't see 
a dereference of the pointer value.

(2) Taintedness of a memory region (i.e. `isTainted(state, location)` where 
`location` is presumably a `MemRegionVal`) is poorly defined and might mean 
three different things:
(a) **The 'begin' pointer value of the region is tainted** (controlled by an 
untrusted source). This may be a "we're already dead" situation (e.g. the 
pointer is an arbitrary value from `scanf`) or something completely harmless 
(we are examining something like the second subscript operator in 
`large_array[user_input].field[idx]` after verifying that the first subscript 
with the tainted index is valid), but in either case, the reliability of the 
'begin' pointer should not affect the comparison between the region extent and 
the index value.
(b)** The extent of the region is tainted** (controlled by an untrusted source, 
e.g. it's an user input string with an unknown length). In this case it's 
reasonable to turn on the "paranoid" comparison mode -- however, this situation 
should be recorded by marking the //extent symbol// of the region as tainted 
instead of marking the region itself as tainted. I'd be happy to see a commit 
that ensures that the extent symbol of user input strings is tainted and its 
taintedness is handled in ArrayBoundsV2.
(c)** The contents of the region are tainted** (controlled by an untrusted 
source). Arguably this should be recorded by marking the //contents// as 
tainted, but that's difficult to implement on a string/array, so it's a 
somewhat reasonable shortcut to put the taint on the region itself. This kind 
of taint is completely irrelevant for out of bounds memory access and should 
not affect the comparisons.

In summary, I'd say that this checker should not be affected by the taintedness 
of the memory region itself (but it should continue to monitor tainted indices, 
and if possible, it should be extended to handle tainted extent values). I'm 
not completely opposed to merging this patch if you add enough heuristics and 
workarounds to make it stable in practice, but I feel that this is not the 
right way forward.

It's fortunate that we'll meet in person on Monday because there we can discuss 
this topic (and other plans) in more detail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-31 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/test/Analysis/memory-model.cpp:167
+  clang_analyzer_dumpExtent(a);   // expected-warning {{0 S64b}}
+  clang_analyzer_dumpElementCount(a); // expected-warning {{5 S64b}}
+  clang_analyzer_dumpExtent(t);   // expected-warning {{0 S64b}}

steakhal wrote:
> If the array has zero extent, how can is have any elements?
It has five elements, each element is a 0-element array, total size is 5*0 = 0.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159107: [analyzer] ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations

2023-08-31 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D159107#4630764 , @steakhal wrote:

> In D159107#4630573 , @donat.nagy 
> wrote:
>
>> I don't think that the `[N]` issue is too serious: we can just increment 
>> the array extent when the parent expression of the array subscript operator 
>> is the unary operator `&`. If the past-the-end pointer ends up dereferenced 
>> later, the current code is sufficient to report it as a bug (as the checker 
>> monitors all dereferences).
>
> My instinct suggests that there is more behind the curtain.

As a rough solution we can simply say that this checker ignores `[idx]` 
(because it's just a different way of writing `arr + idx`) and only checks 
expressions where "real" dereference happens. This way the checker would won't 
emit false positives on past-the-end pointers and still report every out of 
bound //memory access// (including off-by-one errors).

This can be refined by adding a check that which validates that in an 
expression like `[idx]` the index satisfies `0 <= idx && idx <= 
array_size`. This is conceptually independent (but can use the same 
implementation as the real memory access check) and would add some useful 
reports and constraints without breaking anything.

In D159107#4630764 , @steakhal wrote:

> For example, on an expression `[N]` (of type  `int arr[10]`), we would 
> constrain `N`. We could say that we allow the one before and one after 
> elements, thus introduce a constraint: `N: [-1, 11]`. This `-1` later could 
> participate in casts, and end up interpreted as a really large unsigned 
> number. I should also think about how would we detect off-by-one errors, 
> which is a common category.

Pointer arithmetic is very restricted in the standard, e.g. 
http://eel.is/c++draft/basic.compound says that

> Every value of pointer type is one of the following:
> (3.1) a pointer to an object or function (the pointer is said to point to the 
> object or function), or
> (3.2) a pointer past the end of an object ([expr.add]), or
> (3.3) the null pointer value for that type, or
> (3.4) an invalid pointer value.

As this explicitly rules out before-first-element pointers and they are not too 
common (I don't recall any code that used them), I don't think that they 
deserve a special case (just report them if we see them).

In D159107#4630764 , @steakhal wrote:

> The thing is, that I only have this week, aka. 1.5 days before I leave for 
> vacation. :D
> In the future, (no ETA), we likely come back to the Juliet improvements.

Of course, it's completely reasonable to focus on a limited set of changes 
before the vacation.

In the meantime, I'll probably work on one or more commits that would (1) 
switch to using the concrete `check::PostStmt` callbacks instead of 
`check::Location` (and `Bind`) like this commit (2) improve the diagnostics, 
add more details to the messages. When will you return from the vacation? 
Should I wait for you with the review of these changes (if I can write and test 
them before your return)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159107/new/

https://reviews.llvm.org/D159107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159107: [analyzer] ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations

2023-08-31 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I don't think that the `[N]` issue is too serious: we can just increment 
the array extent when the parent expression of the array subscript operator is 
the unary operator `&`. If the past-the-end pointer ends up dereferenced later, 
the current code is sufficient to report it as a bug (as the checker monitors 
all dereferences).

I'd be happy to see (a slightly extended variant of) this commit merged, 
because I could provide much better warning messages if I can access the 
concrete subscript/dereference operations. Of course if you don't have time to 
work on this I can put this up for review myself (probably after your other 
commits are handled).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159107/new/

https://reviews.llvm.org/D159107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp:34
+if (auto SSize =
+SVB.convertToArrayIndex(*Size).getAs())
+  return *SSize;

danix800 wrote:
> donat.nagy wrote:
> > I think it's a good convention if `getDynamicExtent()` will always return 
> > concrete values as `ArrayIndexTy`. (If I didn't miss some very obscure 
> > case, then this will be true when this commit is merged.) 
> Could you elaborate this a bit more please? Do you mean by 
> `nonloc::ConcreteInt` rather than `DefinedOrUnknownSVal`?
Sorry for the confusing comment! I just intended to summarize what you did 
without requesting any additional change.

My remark highlighted that (after your change) whenever the result of this 
function is a `nonloc::ConcreteInt`, the `getType()` method of that 
`ConcreteInt` object will return `ArrayIndexTy` (which is a `QualType` constant 
that represents the type `long long`).

There are also some situations when the result of `getDynamicExtent()` is not a 
`nonloc::ConcreteInt`; so it's correct that the return type should be 
`DefinedOrUnknownSVal` (which is the common supertype of all possible results 
[1]) and it's correct to use this return type in the `getAs<>` on the 
highlighted line.

[1] See inheritance diagram at 
https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159108: [analyzer] CStringChecker should check the first byte of the destination of strcpy, strncpy

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM if the test results are good.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159108/new/

https://reviews.llvm.org/D159108

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159109: [analyzer] CStringChecker buffer access checks should check the first bytes

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM if the test results are also good.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159109/new/

https://reviews.llvm.org/D159109

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159107: [analyzer] ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Good direction of development, this will be useful for providing better bug 
reports (in addition to ensuring correct behavior some situations).

Note that it's also possible to dereference pointers with the operator `->`, 
which is represented by `MemberExpr`s in the AST; we should probably handle 
that as if it was a `UO_Deref`.

There is also a small corner case that for an array `some_type arr[N]` it's 
well-defined to form the past-the-end pointer as `[N]` (instead of `arr + 
N`) -- while any other use of `arr[N]` is undefined behavior. If this occurs in 
practice, then we'll probably need some additional logic to handle it. (Note 
that the `check::Location` implementation dodged this question, because it 
didn't report anything when the program formed `[N]`, but later created a 
bug report when this pointer value was dereferenced.)




Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:34
+class ArrayBoundCheckerV2
+: public Checker,
+ check::PostStmt> {

Which testcase would break without the `check::Bind` callback? (Not action 
needed, I'm just curious.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159107/new/

https://reviews.llvm.org/D159107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159105: [analyzer] ArrayBoundCheckerV2 should check the region for taint as well

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D159105#4627785 , @steakhal wrote:

> In D159105#4627724 , @donat.nagy 
> wrote:
>
>> I was thinking about adding an improvement like this for the sake of 
>> consistency, but I fear that this might cause a surprising amount of false 
>> positives. (Did you test it on larger codebases?)
>
> I'm just done backporting the first batch to our fork and scheduling a 
> large-scale measurement.
> But my gut instinct suggests there won't be too many new reports, as I 
> believe "tainted" pointers shouldn't be really a thing, except for really 
> exotic cases, like `scanf` a pointer - which is dubious at best in presence 
> of ASLR.
>
>> As far as I understand (correct me if I'm wrong), there are situations when 
>> TaintChecker marks the memory region of e.g. a returned string as tainted to 
>> mark that the //contents// of the region are unreliable. (There may be other 
>> more exotic situations when even the //pointer value// or the //extent// 
>> becomes unreliable e.g. your testcases where you `scanf` into a pointer.)
>
> Taint can only bind to symbols, thus a memregion per-say cannot be tainted, 
> only the "conjured address of it" or the "symbol bound to the 
> pointee's/referred lvalue" can be tainted.

Thanks for the clarification! As I dig deeper I see that the functions that 
look like "let's taint a MemRegion" are in fact only affecting 
`SymbolicRegion`s (targeting the corresponding symbol). This taint may be seen 
by the logic that checks the taintedness of the ElementRegion -- but as it's 
limited to symbolic regions it probably won't appear in the underflow test 
[which is skipped for symbolic regions //in unknown space//]. (There are also 
some calls which taint the pointee's/referred value -- that won't affect us.)

> WDYM by "are unreliable" and "becomes unreliable"?

"may be controlled by a malicious actor", basically a synonym of "tainted" but 
I used "tainted" for things that are marked as tainted by the SA code and 
"unreliable" for things that would be marked as tainted by an ideal algorithm.

>> I fear that the overzealous handling of tainted data would produce too many 
>> false positives in situations when code indexes strings that contain user 
>> input and the SA engine cannot understand the logic that calculates the 
>> indices. For example if I understand it correctly a function like
>>
>>   char f(int n) {
>> char *var = getenv("FOO");
>> return var[n];
>>   }
>>
>> would be reported as an out of bounds memory access, because the region is 
>> tainted and the index value is not known. This is especially problematic on 
>> the underflow side (which is introduced by your commit), because I'd assume 
>> that it's common to have numeric values (e.g. function arguments) where the 
>> programmer knows that they're nonnegative, but the analyzer cannot deduce 
>> this.
>
> To me, this is a TP. The envvar might not be present, and even if it's 
> present, the relation of the string length of `var` to `n` is not expressed 
> or checked; thus this deserves a report.
> But I see your concern, however, I cannot think of valid counterexamples, 
> aka. FPs off the top of my head. I'll come back once I see the real results.

Hmm, you're right that my example is a TP as written; however I could imagine 
similar FPs where we e.g. check that `n < strlen(var)` but don't check that `n` 
is nonnegative (because that's the responsibility of the caller).

Anyway, let's wait for the test results, they'll be enlightening. (You mostly 
convinced me that your commit will work, but Murphy's laws are still 
standing...)

>> Also note that the error message "Out of bound memory access (index is 
>> tainted)" becomes misleading after this patch -- but don't spend too much 
>> time on resolving this, because right now I'm working on a complete rewrite 
>> the message generation to replace the current spartan messages with useful 
>> error reporting.
>
> I agree, but I'm not sure how much more readable something more accurate like 
> "the location where this pointer points to potentially depends on untrusted 
> tainted data". (Note that this would also work for tainted indexes as well).
>
> I must admit, that reporting and diagnostics were low priorities for this 
> patch stack, so I'd focus on addressing logic concerns first for the whole 
> stack and then come back to reporting to make it consistent across the stack.

Improving reporting and diagnostics is also on my TODO list, so we should 
coordinate progress in that area to avoid redundant work. If you wish to work 
on it, then I'm happy to review; otherwise I'll do something with it soon 
(perhaps after you merged these commits to avoid merge conflicts).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105


[PATCH] D159106: [analyzer] ArrayBoundCheckerV2 should listen to check::Bind as well

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

This seems to be a straightforward improvement over the current situation; LGTM 
if you test(ed) it on some real-life code (to ensure that it doesn't introduce 
a corner case that crashes).




Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:46
 
+  void impl(SVal Loc, bool isLoad, const Stmt *S, CheckerContext ) const;
+

I'd call this function `performCheck` or something similar, but "`impl`" is 
also fine especially if it's a traditional name (that I haven't encountered yet 
in clang SA code).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159106/new/

https://reviews.llvm.org/D159106

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159105: [analyzer] ArrayBoundCheckerV2 should check the region for taint as well

2023-08-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I was thinking about adding an improvement like this for the sake of 
consistency, but I fear that this might cause a surprising amount of false 
positives. (Did you test it on larger codebases?)

As far as I understand (correct me if I'm wrong), there are situations when 
TaintChecker marks the memory region of e.g. a returned string as tainted to 
mark that the //contents// of the region are unreliable. (There may be other 
more exotic situations when even the //pointer value// or the //extent// 
becomes unreliable e.g. your testcases where you `scanf` into a pointer.)

I fear that the overzealous handling of tainted data would produce too many 
false positives in situations when code indexes strings that contain user input 
and the SA engine cannot understand the logic that calculates the indices. For 
example if I understand it correctly a function like

  char f(int n) {
char *var = getenv("FOO");
return var[n];
  }

would be reported as an out of bounds memory access, because the region is 
tainted and the index value is not known. This is especially problematic on the 
underflow side (which is introduced by your commit), because I'd assume that 
it's common to have numeric values (e.g. function arguments) where the 
programmer knows that they're nonnegative, but the analyzer cannot deduce this.

Also note that the error message "Out of bound memory access (index is 
tainted)" becomes misleading after this patch -- but don't spend too much time 
on resolving this, because right now I'm working on a complete rewrite the 
message generation to replace the current spartan messages with useful error 
reporting.




Comment at: clang/test/Analysis/taint-generic.c:1133-1141
+void testTaintedLowerConstrainedIndex() {
+  int n;
+  scanf("%d", );
+  if (n >= 0) {
+// We only constained the lower end, and it's tainted => report.
+Buffer[n] = 1; // expected-warning {{Out of bound memory access (index is 
tainted)}}
+  }

Also add the "opposite" of this where you test `if (n < BUFSIZE)`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Yes, adding tests that demonstrate the current behavior is a good idea.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D158707#4621135 , @steakhal wrote:

> Oh, so we would canonicalize towards a signed extent type. I see. I think I'm 
> okay with that.
> However, I think this needs a little bit of polishing and testing when the 
> region does not point to the beginning of an array or object, but rather 
> inside an object, or like this:
>
>   int nums[] = {1,2,3};
>   char *p = (char*)[1];
>   
>   clang_analyzer_dumpExtent(p);
>   clang_analyzer_dumpElementCount(p);
>   ++p;
>   clang_analyzer_dumpExtent(p);
>   clang_analyzer_dumpElementCount(p);
>   ++p;
>   int *q = (int*)p;
>   clang_analyzer_dumpExtent(q);
>   clang_analyzer_dumpElementCount(q);

Unfortunately the analyzer engine does not distinguish pointer arithmetic and 
element access, and we cannot handle these situations while that limitation 
exists. For example, if we have an array `int nums[3];`, then the element 
`nums[1]` and the incremented pointer `int *ptr = nums + 1;` are represented by 
the same `ElementRegion` object; so we cannot simultaneously say that (1) 
`nums[1]` is an int-sized memory region, and (2) it's valid to access the 
elements of the array as `ptr[-1]`, `ptr[0]` and `ptr[1]`.

The least bad heuristic is probably `RegionRawOffsetV2::computeOffset()` from 
ArrayBoundCheckerV2, which strips away all the `ElementRegion` layers, acting 
as if they are all coming from pointer arithmetic. This behavior is incorrect 
if we want to query the extent/elementcount of a "real" ElementRegion (e.g. a 
slice of a multidimensional array or `(char*)[1]` in your example), but 
mostly leads to false negatives. On the other hand, if we process a pointer 
increment as if it were an element access, we can easily run into false 
positive reports -- this is why I had to abandon D150446 
.

I'd suggest that we should ignore pointer arithmetic in this commit (or create 
a testcase that documents the current insufficient behavior) and increment the 
priority of a through rewrite of the memory model. This is related to the plan 
https://github.com/llvm/llvm-project/issues/42709 suggested by @NoQ, but 
perhaps it would be enough to do a less drastic change in that direction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158855: [analyzer][NFC] Remove useless class BuiltinBug

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
donat.nagy marked an inline comment as done.
Closed by commit rG8a5cfdf7851d: [analyzer][NFC] Remove useless class 
BuiltinBug (authored by donat.nagy).

Changed prior to commit:
  https://reviews.llvm.org/D158855?vs=553498=553909#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158855/new/

https://reviews.llvm.org/D158855

Files:
  clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/BoolAssignmentChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CastSizeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ChrootChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
  clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/FixedAddressChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/NonNullParamChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ObjCAtSyncChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ReturnPointerRangeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ReturnUndefChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/TestAfterDivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefBranchChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefCapturedBlockVarChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefinedArraySubscriptChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefinedAssignmentChecker.cpp
  
clang/lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedObjectChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/VforkChecker.cpp
  clang/lib/StaticAnalyzer/Core/BugReporter.cpp
  clang/unittests/StaticAnalyzer/CallEventTest.cpp
  clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
  clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp

Index: clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
===
--- clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
+++ clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
@@ -89,8 +89,8 @@
 
 class CheckerRegistrationOrderPrinter
 : public Checker> {
-  std::unique_ptr BT =
-  std::make_unique(this, "Registration order");
+  std::unique_ptr BT =
+  std::make_unique(this, "Registration order");
 
 public:
   void checkPreStmt(const DeclStmt *DS, CheckerContext ) const {
@@ -125,8 +125,7 @@
 
 #define UNITTEST_CHECKER(CHECKER_NAME, DIAG_MSG)   \
   class CHECKER_NAME : public Checker> {  \
-std::unique_ptr BT =   \
-std::make_unique(this, DIAG_MSG);  \
+std::unique_ptr BT = std::make_unique(this, DIAG_MSG);   \
\
   public:  \
 void checkPreStmt(const DeclStmt *DS, CheckerContext ) const {}  \
Index: clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
===
--- clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
+++ clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
@@ -26,7 +26,7 @@
 
 class FalsePositiveGenerator : public Checker {
   using Self = FalsePositiveGenerator;
-  const BuiltinBug FalsePositiveGeneratorBug{this, "FalsePositiveGenerator"};
+  const BugType FalsePositiveGeneratorBug{this, "FalsePositiveGenerator"};
   using HandlerFn = bool (Self::*)(const CallEvent ,
CheckerContext &) const;
   CallDescriptionMap Callbacks = {
Index: clang/unittests/StaticAnalyzer/CallEventTest.cpp
===
--- clang/unittests/StaticAnalyzer/CallEventTest.cpp
+++ clang/unittests/StaticAnalyzer/CallEventTest.cpp
@@ -36,11 +36,11 @@
 }
 
 class CXXDeallocatorChecker : public Checker {
-  std::unique_ptr BT_uninitField;
+  std::unique_ptr BT_uninitField;
 
 public:
   CXXDeallocatorChecker()
-  : BT_uninitField(new BuiltinBug(this, "CXXDeallocator")) {}
+  : BT_uninitField(new BugType(this, "CXXDeallocator")) {}
 
   void checkPreCall(const CallEvent , CheckerContext ) const {
 const auto *DC = dyn_cast();
Index: clang/lib/StaticAnalyzer/Core/BugReporter.cpp

[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

Thanks for the updates! I didn't spot any important issue, so I'm accepting 
this commit, but let's wait for the opinion of @steakhal before merging this.

In D158707#4613955 , @danix800 wrote:

> In D158707#4613743 , @donat.nagy 
> wrote:
>
>> [...] Does this commit fix the "root" of the issue by eliminating some 
>> misuse of correctly implemented (but perhaps surprising) `SVal` / `APSInt` 
>> arithmetic; or is there an underlying bug in the `SVal` / `APSInt` 
>> arithmetic which is just avoided (and not eliminated) by this commit? In the 
>> latter case, what obstacles prevent us from fixing the root cause?
>
> For the observed signed compared to unsigned unexpectation from ArrayBoundV2,
> the constraint manager does give the CORRECT result.
>
> It's a misuse of the constraint API, mainly caused by unexpected unsigned 
> extent
> set by memory modeling. Actually `ArrayBoundV2::getSimplifiedOffsets()` has 
> clear
> comment that this `signed <-> unsigned` comparison is problematic.
>
>   // TODO: once the constraint manager is smart enough to handle non 
> simplified
>   // symbolic expressions remove this function. Note that this can not be 
> used in
>   // the constraint manager as is, since this does not handle overflows. It is
>   // safe to assume, however, that memory offsets will not overflow.
>   // NOTE: callers of this function need to be aware of the effects of 
> overflows
>   // and signed<->unsigned conversions!
>   static std::pair
>   getSimplifiedOffsets(NonLoc offset, nonloc::ConcreteInt extent,
>SValBuilder ) {

In fact, the "NOTE:" comment was added by my patch D135375 
 (which eliminated lots of false positives 
caused by a situation when the argument `offset` was unsigned), but I was still 
confused by this new bug. (Where the other argument `extent` was unsigned and 
this led to incorrect conclusions like "`len` cannot be larger than `3u`, so 
`len` cannot be `-1`, because `-1` is larger than `3u`"  .) Thanks for 
troubleshooting this issue!




Comment at: clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp:34
+if (auto SSize =
+SVB.convertToArrayIndex(*Size).getAs())
+  return *SSize;

I think it's a good convention if `getDynamicExtent()` will always return 
concrete values as `ArrayIndexTy`. (If I didn't miss some very obscure case, 
then this will be true when this commit is merged.) 



Comment at: clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp:49
+return UnknownVal();
+  MR = MR->StripCasts();
+

Note that `StripCasts()` is overzealous and strips array indexing when the 
index is zero. For example if you have a rectangular matrix `int m[6][8];` then 
this function would (correctly) say that the element count of `m[1]` is 8, but 
it would claim that the element count of `m[0]` is 6 (because it strips the 
'cast' and calculates the element count of `m`).

This is caused by the hacky "solution" that casts are represented by 
`ElementRegion{original region, 0, new type}` which cannot be distinguished 
from a real element access where the index happens to be 0. (This is just a 
public service announcement, this already affects e.g. `getDynamicExtent` and I 
don't think that you can find a local solution. For more information, see also 
https://github.com/llvm/llvm-project/issues/42709 which plans to refactor the 
memory model.)



Comment at: clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp:72-75
+  auto ElementCount =
+  SVB.evalBinOp(State, BO_Div, Size, ElementSize, SVB.getArrayIndexType())
+  .getAs();
+  return ElementCount ? *ElementCount : UnknownVal();

Perhaps use the method `value_or` of std::optional?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158968: [analyzer] Fix assertion on casting SVal to NonLoc inside the IteratorRange checker

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM. I'm not familiar with the Iterator checker family, but this is a very 
straightforward change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158968/new/

https://reviews.llvm.org/D158968

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158855: [analyzer][NFC] Remove useless class BuiltinBug

2023-08-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked an inline comment as done.
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BoolAssignmentChecker.cpp:27
   class BoolAssignmentChecker : public Checker< check::Bind > {
-mutable std::unique_ptr BT;
+mutable std::unique_ptr BT;
 void emitReport(ProgramStateRef state, CheckerContext ,

xazax.hun wrote:
> In case you are down to some additional cleanup efforts, we no longer need 
> lazy initialization for `BugType`s. This is an old pattern, we used to need 
> it due to some initialization order problems, but those have been worked 
> around in engine. See the FuchsiaHandleChecker as an example how we do it in 
> newer checks: 
> https://github.com/llvm/llvm-project/blob/ea82a822d990824c58c6fa4b503ca84c4870c940/clang/lib/StaticAnalyzer/Checkers/FuchsiaHandleChecker.cpp#L190
> 
> Feel free to ignore this or do it in a follow-up PR. 
Great news! I'll keep it in mind and I'll try to do this clean-up if I have 
enough free time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158855/new/

https://reviews.llvm.org/D158855

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158855: [analyzer][NFC] Remove useless class BuiltinBug

2023-08-25 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.
Herald added a subscriber: ormris.



Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:35-36
 public Checker {
-  mutable std::unique_ptr BT;
+  mutable std::unique_ptr BT;
   mutable std::unique_ptr TaintBT;
 

This inconsistency is the "original target" of this commit. I wanted to unify 
the two types of bug report creation ("regular" and tainted) and as I examined 
the situation I noticed that `BuiltinBug` isa confusing mess and it's always 
easy to replace it with `BugType`. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158855/new/

https://reviews.llvm.org/D158855

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158855: [analyzer][NFC] Remove useless class BuiltinBug

2023-08-25 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy created this revision.
donat.nagy added reviewers: Szelethus, steakhal, gamesh411, NoQ.
Herald added subscribers: manas, ASDenysPetrov, martong, dkrupp, 
mikhail.ramalho, a.sidorin, szepet, baloghadamsoftware, xazax.hun.
Herald added a project: All.
donat.nagy requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

...because it provides no useful functionality compared to its base class 
`BugType`.

A long time ago there were substantial differences between `BugType` and 
`BuiltinBug`, but they were eliminated by commit 1bd58233 in 2009 (!).  Since 
then the only functionality provided by `BuiltinBug` was that it specified 
`categories::LogicError` as the bug category and it stored an extra data member 
`desc`.

This commit sets `categories::LogicError` as the default value of the third 
argument (bug category) in the constructors of BugType and replaces use of the 
`desc` field with simpler logic.

Note that `BugType` has a data member `Description` and a non-virtual method 
`BugType::getDescription()` which queries it; these are distinct from the 
member `desc` of `BuiltinBug` and the shadowing method 
`BuiltinBug::getDescription()` which queries it. This confusing name collision 
was a major motivation for the elimination of `BuiltinBug`.

As this commit touches many files, I avoided functional changes and left behind 
several FIXME notes on messages that should be improved later.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158855

Files:
  clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/BoolAssignmentChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CastSizeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ChrootChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
  clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/FixedAddressChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/NonNullParamChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ObjCAtSyncChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ReturnPointerRangeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/ReturnUndefChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/TestAfterDivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefBranchChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefCapturedBlockVarChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefinedArraySubscriptChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/UndefinedAssignmentChecker.cpp
  
clang/lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedObjectChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/VforkChecker.cpp
  clang/lib/StaticAnalyzer/Core/BugReporter.cpp
  clang/unittests/StaticAnalyzer/CallEventTest.cpp
  clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
  clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp

Index: clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
===
--- clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
+++ clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
@@ -89,8 +89,8 @@
 
 class CheckerRegistrationOrderPrinter
 : public Checker> {
-  std::unique_ptr BT =
-  std::make_unique(this, "Registration order");
+  std::unique_ptr BT =
+  std::make_unique(this, "Registration order");
 
 public:
   void checkPreStmt(const DeclStmt *DS, CheckerContext ) const {
@@ -125,8 +125,7 @@
 
 #define UNITTEST_CHECKER(CHECKER_NAME, DIAG_MSG)   \
   class CHECKER_NAME : public Checker> {  \
-std::unique_ptr BT =   \
-std::make_unique(this, DIAG_MSG);  \
+std::unique_ptr BT = std::make_unique(this, DIAG_MSG);   \
\
   public:  \
 void checkPreStmt(const DeclStmt *DS, CheckerContext ) const {}  \
Index: clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
===
--- clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
+++ clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
@@ -26,7 +26,7 @@
 
 class FalsePositiveGenerator : 

[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

2023-08-24 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Thanks for creating this commit, it's a useful improvement!

I added some inline comments on minor implementation details; moreover, note 
that "Releted checkers:" (instead of "related") is a typo in the commit message.

I also have a less concrete question about the root cause of these bugs. Does 
this commit fix the "root" of the issue by eliminating some misuse of correctly 
implemented (but perhaps surprising) `SVal` / `APSInt` arithmetic; or is there 
an underlying bug in the `SVal` / `APSInt` arithmetic which is just avoided 
(and not eliminated) by this commit? In the latter case, what obstacles prevent 
us from fixing the root cause?




Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:202-203
   // CHECK UPPER BOUND
-  DefinedOrUnknownSVal Size = getDynamicExtent(state, Reg, svalBuilder);
+  SVal Size = svalBuilder.convertToArrayIndex(
+  getDynamicExtent(state, Reg, svalBuilder));
   if (auto KnownSize = Size.getAs()) {

I wonder whether it would be better to move this conversion into the definition 
of `getDynamicExtent` to ensure that it has a consistent return type. Are there 
any situations where it's useful that `getDynamicExtent` can return something 
that's not an `ArrayIndexTy`?



Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:180-183
+  SVal CountReached =
+  SVB.evalBinOp(State, BO_GE, Idx, Count, ASTCtx.BoolTy);
+  if (!CountReached.isUndef() &&
+  State->assume(*CountReached.getAs(), true))

I think checking the nullness of `getAs()` is more elegant than using a 
separate `isUndef()` check.

On a longer term / as a separate improvement, I'd also think about allowing 
`UndefinedVal` as the argument of the `assert()`-like functions, because the 
`evalBinOp` -> `assert` combination is very common in checkers and IIRC in most 
checkers the branch of `UndefinedVal` will produce the same result as 
`UnknownVal`.



Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:186-192
   const ElementRegion *const ER = RegionManager.getElementRegion(
-  CE.getArgExpr(1)->getType()->getPointeeType(), Idx, SuperRegion,
-  Ctx.getASTContext());
+  ElemType,
+  SVB.evalBinOp(State, BO_Add, Idx, MROffset, SVB.getArrayIndexType())
+  .castAs(),
+  SuperRegion, Ctx.getASTContext());
 
   ReqRegions.push_back(ER->getAs());

I'd use `getAs()` and a conditional to avoid crashes in the 
(theoretical) case that `evalBinOp` returns `UnknownVal`; and I suspect that 
`getAs()` is superfluous because `MemRegion` is a base class of 
`ElementRegion`.



Comment at: clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp:92
+
+  return ElementCount.castAs();
+}

Are you sure that this cannot cause crashes? (E.g. did you check that there is 
no corner case when `getElementExtent` returns 0?)

I can accept this cast, especially if you have a clear proof that it's valid, 
but I'd prefer a more defensive solution that turns `UndefinedVal` into 
`UnknownVal` either here or preferably in the `assert()` function family that 
consumes the results from functions like this. 



Comment at: clang/test/Analysis/array-bound-v2-constraint-check.c:1
-// RUN: %clang_analyze_cc1 
-analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection \
+// RUN: %clang_analyze_cc1 
-analyzer-checker=core,unix,alpha.security.ArrayBoundV2,debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false -verify %s

Perhaps only enable `unix.Malloc` to ensure that this test is not affected by 
changes to other checkers in the `unix` group. (It's enough for the testcase 
that you added.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158499: [analyzer] Compute FAM dynamic size

2023-08-23 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D158499#4608974 , @danix800 wrote:

> One of the observable issue with inconsistent size type is
>
>   void clang_analyzer_eval(int);
>   
>   typedef unsigned long long size_t;
>   void *malloc(unsigned long size);
>   void free(void *);
>   
>   void symbolic_longlong_and_int0(long long len) {
> char *a = malloc(5);
> (void)a[len + 1]; // no-warning
> // len: [-1,3]
> clang_analyzer_eval(-1 <= len && len <= 3); // expected-warning {{TRUE}}
> clang_analyzer_eval(0 <= len);  // expected-warning 
> {{UNKNOWN}}
> clang_analyzer_eval(len <= 2);  // expected-warning 
> {{UNKNOWN}}
> free(a);
>   }
>
> which is extracted from 
> `clang/test/Analysis/array-bound-v2-constraint-check.c`,
> with `DynamicMemoryModeling` turned on,
> the second warning does not hold anymore: `clang_analyzer_eval(0 <= len);`
> will be reported as `TRUE` which is not expected.
>
> `DynamicMemoryModeling` will record the extent of allocated memory as `5ULL`,
> `ArrayBoundV2` will do `len + 1 < 5ULL` assumption, simplified to `len < 
> 4ULL`,
> which casts `len` to unsigned, dropping `-1`, similar to
>
>   void clang_analyzer_eval(int);
>   
>   void test(int len) {
> if (len >= -1 && len <= 4U) { // len is promoted into unsigned, thus can 
> never be negative
> clang_analyzer_eval(0 <= len);  // TRUE
> }
>   }

This issue is very interesting and scary -- I wouldn't have guessed that the 
core constraint handling algorithms are so buggy that things like this can 
happen . I reproduced the issue and double-checked that the logic error is 
//not// in ArrayBoundV2 (despite that its simplification algorithm is messy and 
suspicious); I'd be very happy if this issue could be resolved (perhaps along 
with other inaccuracies of APSInt math...).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158499/new/

https://reviews.llvm.org/D158499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154603: [analyzer][clangsa] Add new option to alpha.security.cert.InvalidPtrChecker

2023-08-23 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

The change looks promising, I only have minor remarks.




Comment at: clang/lib/StaticAnalyzer/Checkers/cert/InvalidPtrChecker.cpp:125
 
-  const NoteTag *Note =
-  C.getNoteTag([SymbolicEnvPtrRegion, FunctionName](
-   PathSensitiveBugReport , llvm::raw_ostream ) {
-if (!BR.isInteresting(SymbolicEnvPtrRegion))
-  return;
-Out << '\'' << FunctionName
-<< "' call may invalidate the environment parameter of 'main'";
-  });
+  ExplodedNode *CurrentChainEnd = nullptr;
+

gamesh411 wrote:
> steakhal wrote:
> > donat.nagy wrote:
> > > Perhaps add a comment that clarifies that passing a `nullptr` as the 
> > > ExplodedNode to `addTransition` is equivalent to specifying the current 
> > > node. I remember this because I was studying its implementation recently, 
> > > but I would've been surprised and suspicious otherwise.
> > If `nullptr` is equivalent with `C.getPredecessor()` inside 
> > `addTransition()`, why not simply initialize it to that value instead of to 
> > `nullptr`?
> I ended up using C.getPredecessor() instead of explaining; this seems a bit 
> more intuitive (if such a thing even exists in CSA).
> if such a thing even exists in CSA 
 




Comment at: clang/lib/StaticAnalyzer/Checkers/cert/InvalidPtrChecker.cpp:121-122
+const NoteTag *Note =
+C.getNoteTag([Region, FunctionName = SmallString<64>{FunctionName},
+  Message = SmallString<256>{Message}](
+ PathSensitiveBugReport , llvm::raw_ostream ) {

Minor nitpick: in situations like this, when we want to save an already 
composed string, `std::string` is better than `SmallString` because it doesn't 
occupy more memory than the actual length of the string. (OTOH `SmallString` is 
better for buffer variables, I've seen code that creates a SmallString, 
composes the message in it, then converts it to a `std::string` for longer-term 
storage.)

Of course these are all just inconsequential micro-optimization details...



Comment at: clang/test/Analysis/invalid-ptr-checker.c:10
+// RUN: alpha.security.cert.env.InvalidPtr:InvalidatingGetEnv=true \
+// RUN: -analyzer-output=text -verify=pedantic -Wno-unused %s
+

Use `-verify=expected,pedantic` here and then you can eliminate the 
`pedantic-warning` lines that duplicate the messages of `expected-warning`s.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154603/new/

https://reviews.llvm.org/D154603

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-08-21 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D158156#4604242 , @steakhal wrote:

> One more thing to mention. Its usually illadvised to rename files or do 
> changes that would seriously impact git blame, unless we have a really good 
> reason doing so.
> Aesthetics usually don't count one, especially if the name is not 
> user-facing. However, maintainability is another axis, thus as it's always, 
> not black and white.

Git (e.g. git blame) can be smart enough to recognize renamed files if there 
aren't too much changes, but this change may be too large for that (at least 
Phabricator doesn't recognize it as a move, which confused me at first when I 
looked at this review). I'd suggest keeping the old filename in this commit; if 
you wish (and the community accepts it) you could rename the file in a separate 
followup NFC commit that doesn't do anything else.

In D158156#4604221 , @steakhal wrote:

> PS: sorry if any of my comments are dups, or outdated. I only had a little 
> time last week, and things have progressed since then I noticed. I still 
> decided to submit my possibly outdated and partial review comments. I hope 
> you understand.
> Its quite difficult to allocate time to do reviews from day to to day work. I 
> unfortunately usually do this in my spare time, if I have.

Thanks for reviewing our commits, it's very helpful for us.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-21 Thread Donát Nagy via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG3e014038b373: [analyzer] Improve underflow handling in 
ArrayBoundV2 (authored by donat.nagy).

Changed prior to commit:
  https://reviews.llvm.org/D157104?vs=549881=552025#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

Files:
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/test/Analysis/out-of-bounds.c


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -172,13 +172,18 @@
 return;
 
   NonLoc ByteOffset = RawOffset->getByteOffset();
+  const SubRegion *Reg = RawOffset->getRegion();
 
   // CHECK LOWER BOUND
-  const MemSpaceRegion *SR = RawOffset->getRegion()->getMemorySpace();
-  if (!llvm::isa(SR)) {
-// A pointer to UnknownSpaceRegion may point to the middle of
-// an allocated region.
-
+  const MemSpaceRegion *Space = Reg->getMemorySpace();
+  if (!(isa(Reg) && isa(Space))) {
+// A symbolic region in unknown space represents an unknown pointer that
+// may point into the middle of an array, so we don't look for underflows.
+// Both conditions are significant because we want to check underflows in
+// symbolic regions on the heap (which may be introduced by checkers like
+// MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
+// non-symbolic regions (e.g. a field subregion of a symbolic region) in
+// unknown space.
 auto [state_precedesLowerBound, state_withinLowerBound] =
 compareValueToThreshold(state, ByteOffset,
 svalBuilder.makeZeroArrayIndex(), svalBuilder);
@@ -194,8 +199,7 @@
   }
 
   // CHECK UPPER BOUND
-  DefinedOrUnknownSVal Size =
-  getDynamicExtent(state, RawOffset->getRegion(), svalBuilder);
+  DefinedOrUnknownSVal Size = getDynamicExtent(state, Reg, svalBuilder);
   if (auto KnownSize = Size.getAs()) {
 auto [state_withinUpperBound, state_exceedsUpperBound] =
 compareValueToThreshold(state, ByteOffset, *KnownSize, svalBuilder);


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -172,13 +172,18 @@
 return;
 
   NonLoc ByteOffset = RawOffset->getByteOffset();
+  const SubRegion *Reg = RawOffset->getRegion();
 
   // CHECK LOWER BOUND
-  const MemSpaceRegion *SR = RawOffset->getRegion()->getMemorySpace();
-  if (!llvm::isa(SR)) {
-// A pointer to UnknownSpaceRegion may point to the middle of
-// an allocated region.
-
+  const MemSpaceRegion *Space = Reg->getMemorySpace();
+  if (!(isa(Reg) && isa(Space))) {
+// A symbolic region in unknown space represents an unknown pointer that
+   

[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-21 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Yup, there was a superfluous line break that was corrected by git clang-format; 
thanks for bringing it to my attention. As this is a very trivial change, I'll 
merge the fixed commit directly, without uploading it into Phabricator.




Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:202-203
   // CHECK UPPER BOUND
   DefinedOrUnknownSVal Size =
-  getDynamicExtent(state, RawOffset->getRegion(), svalBuilder);
+  getDynamicExtent(state, Reg, svalBuilder);
   if (auto KnownSize = Size.getAs()) {

This will be a single line in the commit that I'll merge.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-08-21 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

By the way, what would you think about putting the implementation of the two 
checkers (DeleteWithNonVirtualDtor, ArrayDelete) into two separate checker 
classes and placing the shared code into a common base class of these classes? 
I think this could be a significant simplification: we could avoid the manual 
management of the two checker names and bug types and register two regular 
checker objects instead of the one dummy prerequisite checker + two "fake" 
checker registrations that just toggle boolean flags.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-21 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

A (somewhat delayed) ping to @steakhal because you asked me to tell you when I 
have the results. (If you've already seen the table that I uploaded on Friday, 
then you've nothing to do.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-18 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

The results on open-source projects are depressing, but acceptable. This 
checker is looking for a serious defect, so it doesn't find any true positives 
on stable versions of open-source projects; however it produces a steady 
trickle of false positives because the Clang SA engine regularly misinterprets 
complicated code. As this patch allows this checker to analyze more situations, 
it introduces no true positives and a manageable amount of false positives (on 
average ~1/project).

Table of raw results:

| memcached | New reports 

   | Lost reports 

   | no change  
 |
| tmux  | New reports 

 | Lost reports 

 | no change
   |
| twin  | New reports 

   | Lost reports 

   | no change  
 |
| vim   | New reports 

   | Lost reports 

   | no change  
 |
| openssl   | New reports 

 | Lost reports 

 | no change   |
| sqlite| New reports 

   | Lost reports 

   | no change  
 |
| ffmpeg| New reports 

   | Lost reports 

   | four new reports (probably FPs), two of them 
are from the same macro|
| postgres  | New reports 

   | Lost reports 

   | two new false positives
 |
| tinyxml2  | New reports 

 | Lost reports 

 | no change  

[PATCH] D158156: [analyzer] Add C++ array delete checker

2023-08-18 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

This checker is a straightforward, clean implementation of the SEI-CERT rule 
EXP51-CPP, it'll be a good addition to Static Analyzer. I'm satisfied with the 
implementation, although there might be others who have a more refined taste in 
C++ coding style. The only issue that I found is the misleading name of the 
"common ancestor" hidden checker that I mentioned in the inline comment.

The test suite nicely covers all the basic situations, but I'd be happy to see 
some corner cases like virtual and non-virtual diamond inheritance or even a 
degenerate example like

  class Base { /*...*/ };
  class Derived: public Base {
Base basesAsMember[10];
/*...*/
  };
  void f() {
Derived *d = new Derived(...);
delete[] (d->basesAsMember);
  }




Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:757-760
+def CXXArrayModeling: Checker<"CXXArrayModeling">,
+  HelpText<"The base of a few CXX array related checkers.">,
+  Documentation,
+  Hidden;

The name and help text of this hidden dependency is very misleading: it's not 
particularly connected to arrays (the connection between the two child checkers 
is that they both handle the "delete" operator) and it's not a modeling checker 
(which would provide function evaluation, constraints and transitions) but an 
"empty shell" checker that does nothing by itself, but provides a singleton 
checker object where the registration of the child checkers can enable the two 
analysis modes.

Please rename this to `CXXDeleteHelper` or something similar.



Comment at: clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp:88
+  const auto *BaseClassRegion = MR->getAs();
+  const auto *DerivedClassRegion = 
MR->getBaseRegion()->getAs();
+  if (!BaseClassRegion || !DerivedClassRegion)

This logic (which is inherited from the earlier checker) is a bit surprising 
because it strips //all// Element, Field, BaseObject and DerivedObject layers 
from the MemRegion object.

I think it works in practice because
(1) there is an `isDerivedFrom() check after it and
(2) it's rare to write (buggy) code like
```lang=c++
struct Inner { /*...*/ };
struct Outer { Inner inner; /*...*/ };
void f() {
  Outer *o = new Outer(...);
  delete(o->inner);
}
```

Perhaps this could be explained by a comment if you feel that it's warranted; 
but I mostly wrote this as a note for myself and other reviewers. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158156/new/

https://reviews.llvm.org/D158156

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-18 Thread Donát Nagy via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG25b9696b61e5: [analyzer] Upstream BitwiseShiftChecker 
(authored by donat.nagy).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/ReleaseNotes.rst
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-16 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 550706.
donat.nagy added a comment.

A few minutes ago I accidentally re-uploaded the previous version of the patch; 
now I'm fixing this by uploading the actually updated variant.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/ReleaseNotes.rst
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-16 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 2 inline comments as done.
donat.nagy added inline comments.



Comment at: clang/docs/analyzer/checkers.rst:81
+
+Ensure the shift operands are in proper range before shifting.
+

donat.nagy wrote:
> steakhal wrote:
> > We should exactly elaborate on what "proper" means here.
> What would you suggest? I could try to write a slightly longer suggestion, 
> but I don't want to repeat the same conditions that I listed above the code 
> example.
(I'm marking this request as "Done" because I couldn't find a better wording. 
Feel free to reopen the discussion if you wish.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-16 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 550703.
donat.nagy added a comment.

Updated the release notes. This could be the final version of this commit if 
you agree that it's good enough.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{Left shift by '255' overflows the capacity 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-16 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D156312#4589251 , @steakhal wrote:

> I don't think we should do anything about it unless it's frequent enough.
> Try to come up with a heuristic to be able to measure how often this happens, 
> if you really care.
> Once you have a semi-working heuristic for determining if that bugpath 
> suffers from this, you can as well use it for marking the bugreport invalid 
> if that's the case to lean towards suppressing those FPs.

Minor clarification: these are not FPs, these are true positives with a bad 
error message. I was annoyed when I found this surprising bug on the 
almost-ready checker that I was working on; but I wouldn't say that this is an 
especially severe issue.

In D156312#4589251 , @steakhal wrote:

> I don't think it's completely necessary to fix those FPs to land this. I 
> think of that as an improvement, on top of this one.
> I hope I clarified my standpoint.

I agree, I'll create a new revision which mentions this checker in the release 
notes, and I hope that I'll be able to merge that. Later I'll try to return to 
this question with either an engine improvement or a heuristic mitigation.




Comment at: clang/test/Analysis/bitwise-shift-sanity-checks.c:70-74
+  if (right < 0)
+return 0;
+  return left << (right + 32);
+  // expected - warning@-1 {{Left shift overflows the capacity of 'int'}}
+  // expected-warning@-2 {{Right operand is negative in left shift}}

steakhal wrote:
> Let's be explicit about the actual assumed value range, and use 
> `clang_analyzer_value()`.
> I also recommend using an explicit FIXME for calling out what should be 
> there, instead of abusing a non-matching `expected` pattern. I know I used 
> that in the past, and probably seen it from me. I feel ashamed that I did 
> that. I know I did that to have cleaner diffs, once fixed, but I honestly 
> think it does more harm than good.
I already tried calling `clang_analyzer_value()` at that point when I was 
investigating, but unfortunately it won't say anything useful because the 
actual range corresponding to the symbolic expression is only calculated when 
the `evalBinOp()` calls examine it.

I'll change the "expected - warning" to a FIXME.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-15 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

(The release notes entry is still missing, I'll try to add it tomorrow.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-15 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 550318.
donat.nagy edited the summary of this revision.
donat.nagy added a comment.

I verified that the checker handles the examples in the documentation correctly 
(and added them to the test suite). However, as I was tweaking the examples in 
the documentation, I accidentally found a situation where the checker produces 
a very surprising and arguably incorrect error message.

After investigating this issue, I added the testcases 
`signed_aritmetic_{good,bad}` which document the current sub-optimal state. The 
root cause of this problem is a high-level property of the engine (that it 
assumes that signed overflows are always possible and acceptable) and I don't 
see a local workaround that would silence or fix these incorrect error messages.

@steakhal @NoQ What do you know about these signed overflow issues (I presume 
that analogous issues also appear in other checkers)? How should we handle this 
limitation of this checker?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-14 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 549978.
donat.nagy marked 7 inline comments as done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{Left shift by '255' overflows the capacity of 'int'}}
+  // expected-note@-1{{The result of left 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-14 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 13 inline comments as done.
donat.nagy added a comment.

I handled the inline comments, I'll check the example code and edit the release 
notes tomorrow.




Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:272
+  if (const auto ConcreteRight = Right.getAs()) {
+const std::int64_t RHS = ConcreteRight->getValue().getExtValue();
+assert(RHS > MaximalAllowedShift);

steakhal wrote:
> `getExtValue()`, I believe, asserts that it "fits", thus the concrete int is 
> at most 64 bits wide. Does it work for `_BigInt`s? (or whatever we have like 
> that :D)
This final check is executed when we already know that 0 <= ConcreteRight < 128 
//(or the bit width of the largest integer type)//. I added a comment that 
mentions this and replaced the type `std::int64_t` with a plain `unsigned`.

I also updated the testcase `large_right` to ensure that later changes don't 
introduce crashes related to absurdly oversized right arguments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:335-339
+auto R =
+std::make_unique(BT, ShortMsg, Msg, ErrNode);
+bugreporter::trackExpressionValue(ErrNode, Op->getLHS(), *R);
+bugreporter::trackExpressionValue(ErrNode, Op->getRHS(), *R);
+return R;

steakhal wrote:
> AHA! Now I got you. You also call this `R`.
> Just use `R` at the callsite as well. Job done.
This is a code fragment that survived my refactoring efforts without 
significant changes... until now :)

Now that you highlighted it, I'm renaming `R` to `BR` just to be a contrarian :)



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:355-357
+if (!BTPtr)
+  BTPtr = std::make_unique(this, "Bitwise shift",
+"Suspicious operation");

steakhal wrote:
> How about eagerly inline initializing this field? And avoiding `mutable` as 
> well.
This pattern systematically appears in all checkers that I've seen because the 
constructor of `BugType` queries and stores the name of the checker, which is 
only initialized when the checker (`*this`) is registered. This dependency 
means that the `BugType` cannot be initialized in the constructor of the 
`Checker`.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:53
+class BitwiseShiftValidator {
+  // Primary mutable state:
+  CheckerContext 

steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > Where does this comment corresponds to?
> > The two data members (`Ctx` and `State`) that are directly below the 
> > comment.
> > 
> > I tried to split the list of data members into three groups (primary 
> > mutable state, secondary mutable state, `const`  members), but now I 
> > reconsider this I feel that these header comments are actually superfluous.
> > 
> > I'll return to a variant of the earlier layout where I put 
> > `NonNegFlags`/`UpperBound` to the end and comment that they are only used 
> > for note tag creation; but don't emphasize the (immediately visible) 
> > `const`ness / non-`const`-ness of other members with comments.
> One way to emphasize this is by choosing names like `FoldedState`, or 
> anything else that has something to do with "motion" or "change".
> Given that if we have `ProgramStateRefs` in helper classes they usually 
> remain still. Think of BugReportVisitors for example. Consequently,it's 
> important to raise awareness when it's not like that. A different name would 
> achieve this.
My intuition is that the variable name "State" already implies that it will be 
regularly changed and updated as we are going forward; but you're right that in 
this project we've a tradition of storing snapshots of "old" states in various 
helper classes.

(By the way, I'm a bit worried when I read code that stores and passes around 
random stale snapshots of the state. Of course we cannot avoid the branching 
nature of the exploded graph, but other than that it'd be good to store the 
State in language constructs where you cannot accidentally lose information by 
building onto an older state variable.) 



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:60-63
+  // If there is an upper bound, it is <= the size of a primitive type
+  // (expressed in bits), so this is a very safe sentinel value:
+  enum { NoUpperBound = 999 };
+  unsigned UpperBound = NoUpperBound;

steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > Have you considered using `std::optional`?
> > > Given that the dimension of this variable is "bits", I think it would be 
> > > great to have that in its name.
> > I considered using `std::optional`, but using this "old-school" solution 
> > ensures that `NoUpperBound` is comparable to and larger than the "real" 
> > upper bound values, which lets me avoid some ugly boolean logic. I'll 
> > 

[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-14 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 2 inline comments as done.
donat.nagy added a comment.

I didn't upload the test results yet because right now there is some 
incompatibility between our local fork and the upstream. I hope that it'll be 
fixed soon, but until then I can't use our automated tools to run the analysis 
with revisions that are close to the upstream main branch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-14 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 549881.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

Files:
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/test/Analysis/out-of-bounds.c


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -172,13 +172,18 @@
 return;
 
   NonLoc ByteOffset = RawOffset->getByteOffset();
+  const SubRegion *Reg = RawOffset->getRegion();
 
   // CHECK LOWER BOUND
-  const MemSpaceRegion *SR = RawOffset->getRegion()->getMemorySpace();
-  if (!llvm::isa(SR)) {
-// A pointer to UnknownSpaceRegion may point to the middle of
-// an allocated region.
-
+  const MemSpaceRegion *Space = Reg->getMemorySpace();
+  if (!(isa(Reg) && isa(Space))) {
+// A symbolic region in unknown space represents an unknown pointer that
+// may point into the middle of an array, so we don't look for underflows.
+// Both conditions are significant because we want to check underflows in
+// symbolic regions on the heap (which may be introduced by checkers like
+// MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
+// non-symbolic regions (e.g. a field subregion of a symbolic region) in
+// unknown space.
 auto [state_precedesLowerBound, state_withinLowerBound] =
 compareValueToThreshold(state, ByteOffset,
 svalBuilder.makeZeroArrayIndex(), svalBuilder);
@@ -195,7 +200,7 @@
 
   // CHECK UPPER BOUND
   DefinedOrUnknownSVal Size =
-  getDynamicExtent(state, RawOffset->getRegion(), svalBuilder);
+  getDynamicExtent(state, Reg, svalBuilder);
   if (auto KnownSize = Size.getAs()) {
 auto [state_withinUpperBound, state_exceedsUpperBound] =
 compareValueToThreshold(state, ByteOffset, *KnownSize, svalBuilder);


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -172,13 +172,18 @@
 return;
 
   NonLoc ByteOffset = RawOffset->getByteOffset();
+  const SubRegion *Reg = RawOffset->getRegion();
 
   // CHECK LOWER BOUND
-  const MemSpaceRegion *SR = RawOffset->getRegion()->getMemorySpace();
-  if (!llvm::isa(SR)) {
-// A pointer to UnknownSpaceRegion may point to the middle of
-// an allocated region.
-
+  const MemSpaceRegion *Space = Reg->getMemorySpace();
+  if (!(isa(Reg) && isa(Space))) {
+// A symbolic region in unknown space represents an unknown pointer that
+// may point into the middle of an array, so we don't look for underflows.
+// Both conditions are significant because we want to check underflows in
+// symbolic regions on the heap (which may be introduced by checkers like
+// MallocChecker that call 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 549091.
donat.nagy marked 2 inline comments as done.
donat.nagy added a comment.

Uploading a new version based on the reviews.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{Left shift by '255' 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 21 inline comments as done.
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:29-30
+
+using namespace clang;
+using namespace ento;
+

steakhal wrote:
> This is used quite often and hinders the readability by indenting the args a 
> lot. Making the spelling shorter, would somewhat mitigate this.
Good idea!



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:57-59
+  // Secondary mutable state, used only for note tag creation:
+  enum { NonNegLeft = 1, NonNegRight = 2 };
+  unsigned NonNegFlags = 0;

donat.nagy wrote:
> steakhal wrote:
> > How about using the same enum type for these two? It would signify that 
> > these are used together.
> > Also, by only looking at the possible values, it's not apparent that these 
> > are bitflag values. A suffix might help the reader.
> Using the enum type is a good idea, however I'd probably put the `Flag` 
> suffix into the name of the type:
> ```
> enum SignednessFlags : unsigned { NonNegLeft = 1, NonNegRight = 2 };
> SignednessFlags NonNegOperands = 0;
> ```
I rename the member `NonNegFlags` to `NonNegOperands` but its type remains a 
plain `unsigned` because otherwise using the or-assign operator like 
`NonNegOperands |= NonNegLeft` produces the error
> Assigning to 'enum SignednessFlags' from incompatible type 'unsigned int' 
> (clang typecheck_convert_incompatible)
(note that both operands were from the same enum type).



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:218
+   Side == OperandSide::Left ? "Left" : 
"Right",
+   isLeftShift() ? "left" : "right")
+ .str();

steakhal wrote:
> The `isLeftShift() ? "left" : "right"` expression appears 4 times in total.
> I'd also recommend to not distinguish the capitalized "left" and "right" 
> strings, and just post-process the messages to ensure that the first 
> character is capitalized inside the `createBugReport()`.
> 
> Also note that you can reuse the same placeholder, so you would only need to 
> pass the `side()` only once.
> ```lang=C++
> static void capitalizeBeginning(std::string ) {
>   if (Str.empty())
> return;
>   Str[0] = llvm::toUpper(Str[0]);
> }
> 
> StringRef side() const {
>   return isLeftShift() ? "left" : "right";
> }
> ```
I introduced a method for `isLeftShift() ? "left" : "right"` (which is indeed 
repeated several times), but I didn't define `capitalizeBeginning` because that 
would've been used only once.

Your remark that [with the capitalization function] "you can reuse the same 
placeholder, so you would only need to pass the side() only once" seems to be 
incorrect, because `Side == OperandSide::Left ? "Left" : "Right"` (which 
operand are we talking about?) and `isLeftShift() ? "left" : "right"` (what 
kind of shift operator are we talking about?) are two independent values.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 4 inline comments as done.
donat.nagy added a comment.

Thanks for the suggestions, I answered those where I had something to say and 
I'll implement the rest of them.

In D156312#4576120 , @steakhal wrote:

> Have you considered checking the shift operands for taint?
> I wonder if we could warn if we cannot prove the correct use of the shift 
> operator when either of the operands is tainted.

It would be straightforward to add taint handling; but I'd prefer to leave it 
for a follow-up commit. Some experimentation would be needed to see whether it 
produces useful results (do real-world projects use shifts on tainted numbers? 
do we encounter false positives?).




Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:53
+class BitwiseShiftValidator {
+  // Primary mutable state:
+  CheckerContext 

steakhal wrote:
> Where does this comment corresponds to?
The two data members (`Ctx` and `State`) that are directly below the comment.

I tried to split the list of data members into three groups (primary mutable 
state, secondary mutable state, `const`  members), but now I reconsider this I 
feel that these header comments are actually superfluous.

I'll return to a variant of the earlier layout where I put 
`NonNegFlags`/`UpperBound` to the end and comment that they are only used for 
note tag creation; but don't emphasize the (immediately visible) `const`ness / 
non-`const`-ness of other members with comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:57-59
+  // Secondary mutable state, used only for note tag creation:
+  enum { NonNegLeft = 1, NonNegRight = 2 };
+  unsigned NonNegFlags = 0;

steakhal wrote:
> How about using the same enum type for these two? It would signify that these 
> are used together.
> Also, by only looking at the possible values, it's not apparent that these 
> are bitflag values. A suffix might help the reader.
Using the enum type is a good idea, however I'd probably put the `Flag` suffix 
into the name of the type:
```
enum SignednessFlags : unsigned { NonNegLeft = 1, NonNegRight = 2 };
SignednessFlags NonNegOperands = 0;
```



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:60-63
+  // If there is an upper bound, it is <= the size of a primitive type
+  // (expressed in bits), so this is a very safe sentinel value:
+  enum { NoUpperBound = 999 };
+  unsigned UpperBound = NoUpperBound;

steakhal wrote:
> Have you considered using `std::optional`?
> Given that the dimension of this variable is "bits", I think it would be 
> great to have that in its name.
I considered using `std::optional`, but using this "old-school" solution 
ensures that `NoUpperBound` is comparable to and larger than the "real" upper 
bound values, which lets me avoid some ugly boolean logic. I'll update the 
comment and probably also the variable name.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:106-109
+  if (auto BRP = checkOvershift()) {
+Ctx.emitReport(std::move(BRP));
+return;
+  }

steakhal wrote:
> I'd prefer `BR` or `Report`. Since we know already that we are in the 
> path-sensitive domain, `P` has less value there.
> I'd apply this concept everywhere in the patch.
The `P` stands for the "Ptr" in the name of the type alias `BugReportPtr`; but 
your comment clearly highlights that this variable name is confusing ;).

I'll probably avoid the `auto` and use `BugReportPtr Bug = ...` to remind the 
reader that this is a pointer (that can be null) and if it's non-null, then we 
found a bug.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:190-191
+  std::string Msg =
+  llvm::formatv("The result of {0} shift is undefined because the right "
+"operand{1} is not smaller than {2}, the capacity of 
'{3}'",
+isLeftShift() ? "left" : "right", RightOpStr, LHSBitWidth,

steakhal wrote:
> 
This is an important distinction, I use "not smaller than {2}" as a shorter 
synonym of "larger than or equal to {2}". I can choose some other wording if 
you feel that this is not clear enough.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:263
+
+  const SVal Right = Ctx.getSVal(Op->getRHS());
+

steakhal wrote:
> I'd move this closer to the first "use" place.
Good point, previously it was used by the calculation that is now performed by 
`assumeRequirement`.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:315
+default:
+  llvm_unreachable("this checker does not use other comparison operators");
+  }

steakhal wrote:
> 
I'd say that the current wording is correct because the the checker "use"s the 
comparison 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-09 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 6 inline comments as done.
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:94-98
+void BitwiseShiftValidator::run() {
+  if (isValidShift()) {
+Ctx.addTransition(State, createNoteTag());
+  }
+}

donat.nagy wrote:
> NoQ wrote:
> > Because `addTransition()` is a very confusing API that's very easy to 
> > misuse in subtle ways (causing unexpected state splits), I feel anxious 
> > when I see functions like
> > ```
> >   bool isValidShift();
> >   bool isOvershift();
> >   bool isOperandNegative(OperandSide Side);
> >   bool isLeftShiftOverflow();
> > ```
> > that look like boolean getters but in fact call `addTransition()` under the 
> > hood. If we could at least call them `checkOvershift()` etc., that'd be 
> > much better. Ideally I wish there was some safeguard against producing 
> > redundant transitions, like make them return an `ExplodedNode` and chain 
> > them, or something like that.
> These functions were originally called `checkXXX()` but I felt that the 
> meaning of their boolean return value was too confusing with that naming 
> scheme.
> 
> I'll try to ensure that `emitReport` or `addTransition` are only called on 
> the topmost layer (the `run()` method) and change the return types of 
> internal subroutines to make them return nullable pointers (e.g. 
> `std::unique_ptr`) instead of `bool`s. This would (1) 
> eliminate the "what does `true` mean at this point?" issues (2) clarify that 
> this checker never performs more than one transition. (If the shift is 
> invaild, we transition to a sink node by emitting a bug; if the shift is 
> vaild, we perform a transition to record the state update.)
I eliminated `isValidShift()` by merging it with `run()` (which was very short) 
and renamed the rest of the functions to `checkXXX`.

Now the exploded graph is only modified in the top-level method `run()`.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:142
+  BinaryOperator::Opcode 
Comparison,
+  long long Limit) {
+  SValBuilder  = Ctx.getSValBuilder();

donat.nagy wrote:
> steakhal wrote:
> > One does not frequently see the `long long` type.
> > Do you have a specific reason why `int` or `unsigned` wouldn't be good?
> I inherited this type choice from old code and didn't think much about it. My 
> guess is that `LimitVal` needs to be a signed integer (because e.g. `-1 < 0u` 
> would be evaluated to false) and it's a `long long` because that's guaranteed 
> to represent all `unsigned` values. (However, the actual values of `Limit` 
> are very small.)
> 
> I'll probably use `int` for `Limit` and `LimitVal` with some comments to 
> clarify the reason for this type choice.
I changed the type of the `Limit` argument to `unsigned` (because that's what 
this function is called with) and changed the type of `LimitVal` to `IntTy` 
with a comment that describes that it must be signed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-09 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 548626.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{Left shift by '255' overflows the capacity of 'int'}}
+  // expected-note@-1{{The result of left shift is undefined because the right operand 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-09 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked 2 inline comments as done.
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+  pluralSuffix(MaximalAllowedShift));
+R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+Ctx.getLocationContext()});
+Ctx.emitReport(std::move(R));

donat.nagy wrote:
> donat.nagy wrote:
> > gamesh411 wrote:
> > > donat.nagy wrote:
> > > > NoQ wrote:
> > > > > Can we just append this to the warning? The `addNote()` is useful for 
> > > > > notes that need to be placed in code outside of the execution path, 
> > > > > but if it's always next to the warning, it probably doesn't make 
> > > > > sense to display it separately.
> > > > I didn't append this to the warning because I felt that the warning 
> > > > text is already too long. (By the way, an earlier internal variant of 
> > > > this checker produced two separate notes next to the warning message.)
> > > > 
> > > > I tweaked the messages of this checker before initiating this review, 
> > > > but I feel that there's still some room for improvements. (E.g. in this 
> > > > particular case perhaps we could omit some of the details that are not 
> > > > immediately useful and could be manually calculated from other parts of 
> > > > the message...) 
> > > Just a thought on simplifying the diagnostics a bit:
> > > 
> > > Warning: "Right operand is negative in left shift"
> > > Note: "The result of left shift is undefined because the right operand is 
> > > negative"
> > > Shortened: "Undefined left shift due to negative right operand"
> > > 
> > > Warning: "Left shift by '34' overflows the capacity of 'int'"
> > > Note: "The result of left shift is undefined because the right operand 
> > > '34' is not smaller than 32, the capacity of 'int'"
> > > Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"
> > > 
> > > The more complex notes are a bit sketchy, as relevant information can be 
> > > lost, and the following solution is probably a bit too much, but could 
> > > prove to be an inspiration:
> > > 
> > > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > > CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 
> > > 32 bits (including the sign bit), so some bits overflow"
> > > CXX Note: "The value '1024' is represented by 11 bits, allowing at most 
> > > 21 bits for bitshift"
> > > C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 
> > > bits (excluding the sign bit), so some bits overflow"
> > > C Note: "The value '1024' is represented by 11 bits, allowing at most 20 
> > > bits for bitshift"
> > > 
> > > Shortened:
> > > CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' 
> > > capacity (32 bits, including sign)"
> > > C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity 
> > > (31 bits, excluding sign)"
> > Clarification about the `Msg`/`ShortMsg` distinction:
> > I'm intentionally separating the short warning messages and the longer note 
> > messages because `createBugReport()` enforces the convention that it will 
> > always emit a warning and a note at the bug location.
> > 
> > According to the comments in the source code, the intention is that the 
> > note contains all the relevant information, while the warning is a brief 
> > summary that can be displayed in situations where the notes wouldn't fit 
> > the UI.
> > 
> > IIUC many checkers ignore this distinction and emit the same (often long 
> > and cumbersome) message both as a note and as a warning 
> > (`createBugReport()` has a variant which takes only one message parameter 
> > and passes it to both locations), but I tried to follow it because I think 
> > it's ugly when the same message is repeated twice and there is some sense 
> > in providing both a brief summary and a full description that doesn't use 
> > potentially-ambiguous abbreviations to save space.
> > 
> > Of course I could also accept a community decision that this "brief warning 
> > / full note" distinction is deprecated and will be eliminated (because e.g. 
> > front-end utilities are not prepared to handle it), but in that case I'd 
> > strongly suggest a redesign where we eliminate the redundantly printed 
> > 'note' message. (There is no reason to say the same thing twice! There is 
> > no reason to say the same thing twice!)
> > 
> > On the other hand, in addition to this `Msg`/`ShortMsg` distinction, this 
> > part of the code also adds the extra `LeftNote` (as a remnant from an 
> > earlier internal version of this checker), and that's that's what I'd like 
> > to merge into `Msg` (following NoQ's suggestion and keeping it distinct 
> > from the `ShortMsg`).
> Among notes, my only planned change is that instead of 
> 
> > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > CXX Note: "Left 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-09 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy planned changes to this revision.
donat.nagy marked 5 inline comments as done.
donat.nagy added a comment.

I'll soon upload a refactored version.




Comment at: clang/docs/analyzer/checkers.rst:44-50
+Moreover, if the pedantic mode is activated by
+``-analyzer-config core.BitwiseShift:Pedantic=true``, then this checker also
+reports situations where the _left_ operand of a shift operator is negative or
+overflow occurs during the right shift of a signed value. (Most compilers
+handle these predictably, but the C standard and the C++ standards before C++20
+say that they're undefined behavior. In the C++20 standard these constructs are
+well-defined, so activating pedantic mode in C++20 has no effect.)

steakhal wrote:
> I believe shifting out the "sign bit" is UB because the representation of the 
> signed integer type was not defined.
> Prior C++20 (?), it was allowed to represent signed integer types as 
> `two's-complement` (virtually every platform uses this), `one's complement`, 
> `sign-magnitude`. [[ 
> https://en.wikipedia.org/wiki/Signed_number_representations | Wiki ]].
> And it turns out, all these three representations behave differently with 
> negative values.
> Check out [[ https://youtu.be/JhUxIVf1qok?t=2089 | JF's talk ]] from 2018 
> about this.
> 
> I'd need to think about the relevance of C++20 in this context, but here is 
> what I think of now:
> My idea is that the observed behavior is not influenced by the "standard", 
> rather by the time we released C++20, they realized that no actual platform 
> uses weird signed integer representations.
> Given this observation, I'd argue that we shouldn't have this flag at all.
Thanks for the background information! I knew the rough outlines of the 
situation, but your links provided interesting details.

I agree that the non-pedantic mode is the "sane mode" that reflects the real 
behavior of the hardware; but the standards (with the exception of C++20) 
explicitly declare that signed shift overflow and shifting negative values is 
UB and there may be secondary requirements like [[ 
https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow
 | SEI-CERT rule INT32-C  ]] that demand compliance to the standards.

I don't think that I'd spend time on implementing pedantic mode because there 
are more important things to do, but as we already have this feature, I think 
we should release it as an off-by-default option.




Comment at: clang/docs/analyzer/checkers.rst:81
+
+Ensure the shift operands are in proper range before shifting.
+

steakhal wrote:
> We should exactly elaborate on what "proper" means here.
What would you suggest? I could try to write a slightly longer suggestion, but 
I don't want to repeat the same conditions that I listed above the code example.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:142
+  BinaryOperator::Opcode 
Comparison,
+  long long Limit) {
+  SValBuilder  = Ctx.getSValBuilder();

steakhal wrote:
> One does not frequently see the `long long` type.
> Do you have a specific reason why `int` or `unsigned` wouldn't be good?
I inherited this type choice from old code and didn't think much about it. My 
guess is that `LimitVal` needs to be a signed integer (because e.g. `-1 < 0u` 
would be evaluated to false) and it's a `long long` because that's guaranteed 
to represent all `unsigned` values. (However, the actual values of `Limit` are 
very small.)

I'll probably use `int` for `Limit` and `LimitVal` with some comments to 
clarify the reason for this type choice.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:152-156
+if (!StTrue) {
+  // We detected undefined behavior (the caller will report it).
+  State = StFalse;
+  return false;
+}

steakhal wrote:
> Generally, I don't favor mutating member functions.
> It makes it harder to track how we ended up with a given State.
> 
I also agree that mutating member functions are often problematic and may make 
the code more difficult to understand.

However, in this particular case I think it's important to ensure that we're 
always using the most recent state, and that's difficult to guarantee in the 
"pass everything as arguments and return values" style. Moreover, I do some 
parallel bookkeeping because I want to summarize the state changes in a single 
note tag, and that would make the "naive functional" solution even more 
convoluted.

In a functional language like Haskell I'd use computations in a [[ 
https://en.wikibooks.org/wiki/Haskell/Understanding_monads/State | State monad 
]] to implement this logic; my C++ code is a direct translation of that 
solution. (As a secondary utility, my class also provides 

[PATCH] D156787: [analyzer][attr] Add docs to ownership_* attributes

2023-08-08 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/include/clang/Basic/AttrDocs.td:1189
+``onwership_returns``: Functions with this annotation return dynamic memory.
+The second annotation parameter is the size of the returned memory in bytes.
+

Szelethus wrote:
> aaron.ballman wrote:
> > I'm a bit confused on this last bit; how is the user supposed to statically 
> > know that value? I read this as claiming `my_malloc` returns 1 byte of 
> > dynamic storage; can the user tie the allocation to a parameter value 
> > instead?
> I share your confusion, I guess. What I've written here is my best 
> understanding of how this works, unfortunately, these annotations were made 
> in ~2010 (when I was still in elementary school) and abandoned since.
> 
> > I read this as claiming my_malloc returns 1 byte of dynamic storage;
> 
> Thats what the corresponding code in the Static Analyzer leads me to believe 
> as well.
> 
> > can the user tie the allocation to a parameter value instead?
> 
> No. 
I'd guess that this annotation was designed for factory functions that 
construct an instance of a concrete struct type on the heap. Allowing "generic" 
malloc variants (where the allocation size is specified by a parameter) would 
be a good additional feature, but I think that the current annotation also has 
its own uses.

Perhaps it would be good to provide examples like
```
  void __attribute((ownership_takes(malloc, 1))) free_widget(struct widget *);
  struct widget *__attribute((ownership_returns(malloc, sizeof(struct 
widget create_widget(void);
  void __attribute((ownership_holds(malloc, 1))) hold_widget(struct widget *);
```
that correspond to this use case. (By the way, does the checker handle 
non-`void*` pointers?)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156787/new/

https://reviews.llvm.org/D156787

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152436: [clang][analyzer] Move checker alpha.unix.StdCLibraryFunctions out of alpha.

2023-08-08 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Why is this checker placed in the "unix" group (instead of e.g. "core")? As far 
as I understand it can check some POSIX-specific functions with a flag (that's 
off by default) but the core functionality checks platform-independent standard 
library functions.

Moreover, if you can, please provide links to some test result examples on open 
source projects (either run some fresh tests or provide links to an earlier 
test run that was running with the same functionality).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152436/new/

https://reviews.llvm.org/D152436

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added reviewers: dkrupp, steakhal, Szelethus, gamesh411.
donat.nagy added a comment.

I'll try to upload results on open source projects, probably on Tuesday (I'm on 
vacation on Monday).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150446: [analyzer] Check ArraySubscriptExprs in ArrayBoundCheckerV2

2023-08-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy abandoned this revision.
donat.nagy added a comment.
Herald added a subscriber: wangpc.

I'm abandoning this commit because it amalgamates several unrelated changes and 
I think it'd be better to handle them separately:

1. First, there is a very simple, independent improvement related to underflows 
and UnknownSpaces. I already created the separate commit D157104 
 for this (reviews are welcome ;) ).
2. As the title of this commit says, I wanted to turn this checker into a 
`Checker>` instead of a 
`Checker`. I'm still planning to do this transition in a 
separate commit because I feel that this will be needed to compose good warning 
messages and there are a few situations like the testcase `test_field` where 
the `check::Location` model produces counter-intuitive results. (When I 
implement this, I'll also ensure that `*p` and `p->field` are handled 
analogously to `p[0]`, but perhaps these will be in a separate commit.)
3. Finally there is the "simplify `RegionRawOffsetV2::computeOffset` and fix 
multidimensional array handling" change. This is the modification that was 
responsible for the multitude of false positives caused by this commit; and I 
don't have a quick fix for it (the engine abuses `ElementRegion` to record 
pointer arithmetic and I didn't find a clear way to distinguish it from real 
element access). On the short-term I think I'll need to accept that this 
checker will produce false negatives in situations when one element of a 
multi-dimensional array is over-indexed without overflowing the whole array 
(e.g. if `arr` is declared as `int arr[5][5]`, then `arr[1][10]` over-indexes 
`arr[1]`, but points inside the full area of the matrix); but fortunately this 
is not a fatal limitation (it only produces false negatives, multi-dimensional 
arrays are not too common).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150446/new/

https://reviews.llvm.org/D150446

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Note: this commit is split off from the larger commit 
https://reviews.llvm.org/D150446, which combined this simple improvement with 
other, more complex changes that had problematic side-effects.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157104/new/

https://reviews.llvm.org/D157104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157104: [analyzer] Improve underflow handling in ArrayBoundV2

2023-08-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy created this revision.
Herald added subscribers: steakhal, manas, ASDenysPetrov, martong, gamesh411, 
dkrupp, Szelethus, mikhail.ramalho, a.sidorin, szepet, baloghadamsoftware, 
xazax.hun.
Herald added a reviewer: NoQ.
Herald added a project: All.
donat.nagy requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This minor change ensures that underflow errors are reported on arrays that are 
in unknown space but have well-defined size.

As a concrete example, the following test case did not produce a warning 
previously, but will produce a warning after this patch:

  typedef struct {
int id;
char name[256];
  } user_t;
  
  user_t *get_symbolic_user(void);
  char test_underflow_symbolic_2() {
user_t *user = get_symbolic_user();
return user->name[-1];
  }


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D157104

Files:
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/test/Analysis/out-of-bounds.c


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
@@ -173,4 +185,3 @@
   buf[x] = 1;
   clang_analyzer_eval(x <= 99); // expected-warning{{TRUE}}
 }
-
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -172,13 +172,17 @@
 return;
 
   NonLoc ByteOffset = RawOffset->getByteOffset();
+  const SubRegion *Reg = RawOffset->getRegion();
 
   // CHECK LOWER BOUND
-  const MemSpaceRegion *SR = RawOffset->getRegion()->getMemorySpace();
-  if (!llvm::isa(SR)) {
-// A pointer to UnknownSpaceRegion may point to the middle of
-// an allocated region.
-
+  const MemSpaceRegion *Space = Reg->getMemorySpace();
+  if (!(isa(Reg) && isa(Space))) {
+// A symbolic region in unknown space represents an unknown pointer that
+// may point into the middle of an array, so we don't look for underflows.
+// Both conditions are significant because we want to check underflows in
+// symbolic regions on the heap (which may be introduced by checkers that
+// call SValBuilder::getConjuredHeapSymbolVal()) and non-symbolic regions
+// (e.g. a field subregion of a symbolic region) in unknown space.
 auto [state_precedesLowerBound, state_withinLowerBound] =
 compareValueToThreshold(state, ByteOffset,
 svalBuilder.makeZeroArrayIndex(), svalBuilder);
@@ -195,7 +199,7 @@
 
   // CHECK UPPER BOUND
   DefinedOrUnknownSVal Size =
-  getDynamicExtent(state, RawOffset->getRegion(), svalBuilder);
+  getDynamicExtent(state, Reg, svalBuilder);
   if (auto KnownSize = Size.getAs()) {
 auto [state_withinUpperBound, state_exceedsUpperBound] =
 compareValueToThreshold(state, ByteOffset, *KnownSize, svalBuilder);


Index: clang/test/Analysis/out-of-bounds.c
===
--- clang/test/Analysis/out-of-bounds.c
+++ clang/test/Analysis/out-of-bounds.c
@@ -151,11 +151,23 @@
 // Don't warn when indexing below the start of a symbolic region's whose
 // base extent we don't know.
 int *get_symbolic(void);
-void test_index_below_symboloc(void) {
+void test_underflow_symbolic(void) {
   int *buf = get_symbolic();
   buf[-1] = 0; // no-warning;
 }
 
+// But warn if we understand the internal memory layout of a symbolic region.
+typedef struct {
+  int id;
+  char name[256];
+} user_t;
+
+user_t *get_symbolic_user(void);
+char test_underflow_symbolic_2() {
+  user_t *user = get_symbolic_user();
+  return user->name[-1]; // expected-warning{{Out of bound memory access}}
+}
+
 void test_incomplete_struct(void) {
   extern struct incomplete incomplete;
   int *p = (int *)
@@ -173,4 +185,3 @@
   buf[x] = 1;
   clang_analyzer_eval(x <= 99); // expected-warning{{TRUE}}
 }
-
Index: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
===
--- 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-03 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy marked an inline comment as done.
donat.nagy added a comment.

@steakhal  Thanks for the review!

I answered some of your questions; and I'll handle the rest soon.




Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:150
+  SVB.evalBinOp(State, Comparison, Val, LimitVal, SVB.getConditionType());
+  if (auto DURes = ResultVal.getAs()) {
+auto [StTrue, StFalse] = State->assume(DURes.value());

steakhal wrote:
> How about swapping these branches on the `ResultVal.isUndef()` predicate?
The argument of `ProgramState::assume()` must be a `DefinedOrUnknownSVal`, so I 
need to cast the return value of `evalBinOp` (which may be Undefined) to this 
type.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:157-158
+}
+// The code may be valid, so let's assume that it's valid:
+State = StTrue;
+if (StFalse) {

steakhal wrote:
> I wonder if we should add a message here that we introduced a new constraint: 
> "X assumed to be non-negative".
> To add such a message, you should do something similar to what is done in the 
> `StdLibraryFunctionsChecker`.
> Take the `C.getPredecessor()` and chain all your NoteTags by calling the 
> `Pred = C.addTransition(NewState, Pred, Tag)` like this. This way. This is 
> how one "sequence of ExplodedNodes" can be made.
This is already done by `BitwiseShiftValidator::createNoteTag()`. It's a bit 
complex because the checker merges all its assumptions into a single note tag 
(because I didn't want to emit two or three notes directly after each other.



Comment at: clang/test/Analysis/analyzer-config.c:36
 // CHECK-NEXT: cfg-temporary-dtors = true
+// CHECK-NEXT: consider-single-element-arrays-as-flexible-array-members = true
+// CHECK-NEXT: core.BitwiseShift:Pedantic = false

steakhal wrote:
> I'm pretty sure I got rid of this one :D
> Consider rebasing on top of `llvm/main`.
Oops, this might be an error introduced during a rebase... Thanks for catching 
it!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153954: [clang][analyzer] Fix empty enum handling in EnumCastOutOfRange checker

2023-08-03 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/test/Analysis/enum-cast-out-of-range.cpp:203
+
+enum class empty_unfixed {};
+

steakhal wrote:
> donat.nagy wrote:
> > Consider using "specified" and "unspecified" instead of "fixed" and 
> > "unfixed", because the rest of the test file uses them and in my opinion 
> > "unfixed" looks strange in this context.  (I know that e.g. 
> > https://en.cppreference.com/w/cpp/language/enum speaks about "fixed" 
> > underlying context, but it doesn't use "unfixed".) 
> How about calling it "plain" or "common"?
"plain" is also a good alternative for "unfixed" (although it would still be 
inconsistent with the earlier cases); but "common" is confusing, because in 
addition to the "plain, regular, normal" meaning it can also mean "mutual; 
shared by more than one ".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153954/new/

https://reviews.llvm.org/D153954

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153954: [clang][analyzer] Fix empty enum handling in EnumCastOutOfRange checker

2023-08-03 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM, with a little bikeshedding ;-)




Comment at: clang/test/Analysis/enum-cast-out-of-range.cpp:203
+
+enum class empty_unfixed {};
+

Consider using "specified" and "unspecified" instead of "fixed" and "unfixed", 
because the rest of the test file uses them and in my opinion "unfixed" looks 
strange in this context.  (I know that e.g. 
https://en.cppreference.com/w/cpp/language/enum speaks about "fixed" underlying 
context, but it doesn't use "unfixed".) 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153954/new/

https://reviews.llvm.org/D153954

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155715: [clang][analyzer] Improve StdCLibraryFunctions socket send/recv functions.

2023-08-02 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM, I don't have anything significant to add.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155715/new/

https://reviews.llvm.org/D155715

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-08-01 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+  pluralSuffix(MaximalAllowedShift));
+R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+Ctx.getLocationContext()});
+Ctx.emitReport(std::move(R));

gamesh411 wrote:
> donat.nagy wrote:
> > NoQ wrote:
> > > Can we just append this to the warning? The `addNote()` is useful for 
> > > notes that need to be placed in code outside of the execution path, but 
> > > if it's always next to the warning, it probably doesn't make sense to 
> > > display it separately.
> > I didn't append this to the warning because I felt that the warning text is 
> > already too long. (By the way, an earlier internal variant of this checker 
> > produced two separate notes next to the warning message.)
> > 
> > I tweaked the messages of this checker before initiating this review, but I 
> > feel that there's still some room for improvements. (E.g. in this 
> > particular case perhaps we could omit some of the details that are not 
> > immediately useful and could be manually calculated from other parts of the 
> > message...) 
> Just a thought on simplifying the diagnostics a bit:
> 
> Warning: "Right operand is negative in left shift"
> Note: "The result of left shift is undefined because the right operand is 
> negative"
> Shortened: "Undefined left shift due to negative right operand"
> 
> Warning: "Left shift by '34' overflows the capacity of 'int'"
> Note: "The result of left shift is undefined because the right operand '34' 
> is not smaller than 32, the capacity of 'int'"
> Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"
> 
> The more complex notes are a bit sketchy, as relevant information can be 
> lost, and the following solution is probably a bit too much, but could prove 
> to be an inspiration:
> 
> Warning: "Left shift of '1024' overflows the capacity of 'int'"
> CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 
> bits (including the sign bit), so some bits overflow"
> CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 
> bits for bitshift"
> C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 
> bits (excluding the sign bit), so some bits overflow"
> C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits 
> for bitshift"
> 
> Shortened:
> CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity 
> (32 bits, including sign)"
> C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (31 
> bits, excluding sign)"
Clarification about the `Msg`/`ShortMsg` distinction:
I'm intentionally separating the short warning messages and the longer note 
messages because `createBugReport()` enforces the convention that it will 
always emit a warning and a note at the bug location.

According to the comments in the source code, the intention is that the note 
contains all the relevant information, while the warning is a brief summary 
that can be displayed in situations where the notes wouldn't fit the UI.

IIUC many checkers ignore this distinction and emit the same (often long and 
cumbersome) message both as a note and as a warning (`createBugReport()` has a 
variant which takes only one message parameter and passes it to both 
locations), but I tried to follow it because I think it's ugly when the same 
message is repeated twice and there is some sense in providing both a brief 
summary and a full description that doesn't use potentially-ambiguous 
abbreviations to save space.

Of course I could also accept a community decision that this "brief warning / 
full note" distinction is deprecated and will be eliminated (because e.g. 
front-end utilities are not prepared to handle it), but in that case I'd 
strongly suggest a redesign where we eliminate the redundantly printed 'note' 
message. (There is no reason to say the same thing twice! There is no reason to 
say the same thing twice!)

On the other hand, in addition to this `Msg`/`ShortMsg` distinction, this part 
of the code also adds the extra `LeftNote` (as a remnant from an earlier 
internal version of this checker), and that's that's what I'd like to merge 
into `Msg` (following NoQ's suggestion and keeping it distinct from the 
`ShortMsg`).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-07-27 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D156312#4537200 , @NoQ wrote:

>> (4) UBOR exhibits buggy behavior in code that involves cast expressions
>
> Hmm, what makes your check more resilient to our overall type-incorrectness?

The code of UBOR (UndefResultChecker.cpp) uses the method `LHS->countl_zero()` 
(count zeros on the left side of the binary representation) which is very 
convenient but uses the "cached" bit width that is stored in the APSInt and 
apparently not updated by the casts. The checker proposed in this commit avoids 
this pitfall because it calls `getActiveBits()` on the APSInt and subtracts the 
result from the bit width of the type of the LHS (which is queried from the AST 
node).




Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:94-98
+void BitwiseShiftValidator::run() {
+  if (isValidShift()) {
+Ctx.addTransition(State, createNoteTag());
+  }
+}

NoQ wrote:
> Because `addTransition()` is a very confusing API that's very easy to misuse 
> in subtle ways (causing unexpected state splits), I feel anxious when I see 
> functions like
> ```
>   bool isValidShift();
>   bool isOvershift();
>   bool isOperandNegative(OperandSide Side);
>   bool isLeftShiftOverflow();
> ```
> that look like boolean getters but in fact call `addTransition()` under the 
> hood. If we could at least call them `checkOvershift()` etc., that'd be much 
> better. Ideally I wish there was some safeguard against producing redundant 
> transitions, like make them return an `ExplodedNode` and chain them, or 
> something like that.
These functions were originally called `checkXXX()` but I felt that the meaning 
of their boolean return value was too confusing with that naming scheme.

I'll try to ensure that `emitReport` or `addTransition` are only called on the 
topmost layer (the `run()` method) and change the return types of internal 
subroutines to make them return nullable pointers (e.g. 
`std::unique_ptr`) instead of `bool`s. This would (1) 
eliminate the "what does `true` mean at this point?" issues (2) clarify that 
this checker never performs more than one transition. (If the shift is invaild, 
we transition to a sink node by emitting a bug; if the shift is vaild, we 
perform a transition to record the state update.)



Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+  pluralSuffix(MaximalAllowedShift));
+R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+Ctx.getLocationContext()});
+Ctx.emitReport(std::move(R));

NoQ wrote:
> Can we just append this to the warning? The `addNote()` is useful for notes 
> that need to be placed in code outside of the execution path, but if it's 
> always next to the warning, it probably doesn't make sense to display it 
> separately.
I didn't append this to the warning because I felt that the warning text is 
already too long. (By the way, an earlier internal variant of this checker 
produced two separate notes next to the warning message.)

I tweaked the messages of this checker before initiating this review, but I 
feel that there's still some room for improvements. (E.g. in this particular 
case perhaps we could omit some of the details that are not immediately useful 
and could be manually calculated from other parts of the message...) 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-07-26 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy updated this revision to Diff 544317.
donat.nagy added a comment.

(Fix incorrect filename mark in the license header.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
 
 int testNegativeShift(int a) {
   if (a == -5) {
Index: clang/test/Analysis/diagnostics/track_subexpressions.cpp
===
--- clang/test/Analysis/diagnostics/track_subexpressions.cpp
+++ clang/test/Analysis/diagnostics/track_subexpressions.cpp
@@ -12,10 +12,10 @@
 
 void shift_by_undefined_value() {
   uint8_t shift_amount = get_uint8_max(); // expected-note{{'shift_amount' initialized to 255}}
-// expected-note@-1{{Calling 'get_uint8_max'}}
-// expected-note@-2{{Returning from 'get_uint8_max'}}
-  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
-  // expected-note@-1{{The result of the left shift is undefined due to shifting by '255', which is greater or equal to the width of type 'int'}}
+  // expected-note@-1{{Calling 'get_uint8_max'}}
+  // expected-note@-2{{Returning from 'get_uint8_max'}}
+  (void)(TCP_MAXWIN << shift_amount); // expected-warning{{Left shift by '255' overflows the capacity of 'int'}}
+  // 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-07-26 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I tested this commit on several open source projects (in the default mode 
Pedantic=false), the following table summarizes the results:

| vim_v8.2.1920_baseline_vs_vim_v8.2.1920_with_bitwise_shift
   | New reports 

   | Lost reports 

   | no differences 
|
| 
openssl_openssl-3.0.0-alpha7_baseline_vs_openssl_openssl-3.0.0-alpha7_with_bitwise_shift
 | New reports 

 | Lost reports 

 | +2 FPs from same invalid loop assumption [1]   |
| sqlite_version-3.33.0_baseline_vs_sqlite_version-3.33.0_with_bitwise_shift
   | New reports 

   | Lost reports 

   | one FP lost for unclear reasons
|
| ffmpeg_n4.3.1_baseline_vs_ffmpeg_n4.3.1_with_bitwise_shift
   | New reports 

   | Lost reports 

   | 21 new reports, 14 lost reports, 
mostly FPs [2]|
| postgres_REL_13_0_baseline_vs_postgres_REL_13_0_with_bitwise_shift
   | New reports 

   | Lost reports 

   | two FPs are "converted" to new checker, one 
unreadable report lost |
| 
libwebm_libwebm-1.0.0.27_baseline_vs_libwebm_libwebm-1.0.0.27_with_bitwise_shift
 | New reports 

 | Lost reports 

 | no differences   
  |
| xerces_v3.2.3_baseline_vs_xerces_v3.2.3_with_bitwise_shift
   | New reports 

   | Lost reports 

   | no differences 
|
| bitcoin_v0.20.1_baseline_vs_bitcoin_v0.20.1_with_bitwise_shift
   | New reports 

   | Lost reports 

   | 6 new FPs caused by incorrect config of 
our CI [3], one FP lost|
| protobuf_v3.13.0_baseline_vs_protobuf_v3.13.0_with_bitwise_shift  
   | New reports 

 | Lost reports 

[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

2023-07-26 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy created this revision.
Herald added subscribers: steakhal, manas, ASDenysPetrov, martong, gamesh411, 
dkrupp, Szelethus, mikhail.ramalho, a.sidorin, szepet, baloghadamsoftware, 
xazax.hun.
Herald added a reviewer: NoQ.
Herald added a project: All.
donat.nagy requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This commit releases a checker that was developed to a stable level in the 
Ericsson-internal fork of Clang Static Analyzer.

Note that the functionality of this checker overlaps with 
core.UndefinedBinaryOperatorResult ("UBOR"), but there are several differences 
between them:
(1) UBOR is only triggered when the constant folding performed by the Clang 
Static Analyzer engine determines that the value of a binary
operator expression is undefined; this checker can report issues where the 
operands are not constants.
(2) UBOR has unrelated checks for handling other binary operators, this checker 
only examines bitwise shifts.
(3) This checker has a Pedantic flag and by default does not report expressions 
(e.g. -2 << 2) that're undefined by the standard but consistently supported in 
practice.
(4) UBOR exhibits buggy behavior in code that involves cast expressions, e.g.

  void foo(unsigned short s) {
if (s == 2) {
  (void) ((unsigned int) s) << 16;
}
  }

Later it would be good to eliminate this overlap (perhaps by deprecating and 
then eliminating the bitwise shift handling in UBOR), but in my opinion that 
that belongs to separate commits.

Co-authored-by: Endre Fulop 


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156312

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/analyzer-enabled-checkers.c
  clang/test/Analysis/bitwise-ops-nocrash.c
  clang/test/Analysis/bitwise-ops.c
  clang/test/Analysis/bitwise-shift-common.c
  clang/test/Analysis/bitwise-shift-pedantic.c
  clang/test/Analysis/bitwise-shift-sanity-checks.c
  clang/test/Analysis/bitwise-shift-state-update.c
  clang/test/Analysis/casts.c
  clang/test/Analysis/diagnostics/track_subexpressions.cpp
  clang/test/Analysis/left-shift-cxx2a.cpp
  clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
  clang/test/Analysis/symbol-simplification-nonloc-loc.cpp

Index: clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
===
--- clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
+++ clang/test/Analysis/symbol-simplification-nonloc-loc.cpp
@@ -14,7 +14,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -29,7 +29,7 @@
   if (p) {
 // no-crash
   }
-  if (p == (int *)0x404) {
+  if (p == (int *)0x1b) {
 // no-crash
   }
 }
@@ -43,8 +43,6 @@
   nonloc_OP_loc(p, BINOP(-)); // no-crash
 
   // Bitwise operators:
-  // expected-warning@+2 {{The result of the left shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
-  // expected-warning@+2 {{The result of the right shift is undefined due to shifting by '1028', which is greater or equal to the width of type 'int' [core.UndefinedBinaryOperatorResult]}}
   nonloc_OP_loc(p, BINOP(<<)); // no-crash
   nonloc_OP_loc(p, BINOP(>>)); // no-crash
   nonloc_OP_loc(p, BINOP(&));
Index: clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
===
--- clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -24,6 +24,7 @@
 // CHECK-NEXT: apiModeling.TrustReturnsNonnull
 // CHECK-NEXT: apiModeling.llvm.CastValue
 // CHECK-NEXT: apiModeling.llvm.ReturnValue
+// CHECK-NEXT: core.BitwiseShift
 // CHECK-NEXT: core.DivideZero
 // CHECK-NEXT: core.DynamicTypePropagation
 // CHECK-NEXT: core.NonnilStringConstants
Index: clang/test/Analysis/left-shift-cxx2a.cpp
===
--- clang/test/Analysis/left-shift-cxx2a.cpp
+++ clang/test/Analysis/left-shift-cxx2a.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx2a -std=c++2a %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-disable-checker core.BitwiseShift -triple x86_64-apple-darwin13 -Wno-shift-count-overflow -verify=expected,cxx17 -std=c++17 %s
+// RUN: %clang_analyze_cc1 

[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-25 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Mostly LGTM, but I'd like to ask you to ensure consistent line wrapping (to 
e.g. 72 or 80 characters) in the free-flowing text that was added or rewritten 
by this commit. (I know that this is a minor question because the linebreaks 
won't be visible in the final output, but the raw markdown is itself a 
human-readable format and we should keep it reasonably clean.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-07-13 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

(Just added some side remarks about string handling annoyances.)

@balazske Please (1) handle the `dyn_cast` request of  @steakhal and also (2) 
do something with this "how to convert `StringRef` to `char *`" question that 
we're bikeshedding. I hope that after those we could finally merge this commit 
sequence.




Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1305
+  std::string Note =
+  llvm::formatv(Case.getNote().str().c_str(),
+cast(Call.getDecl())->getDeclName());

steakhal wrote:
> donat.nagy wrote:
> > Consider using the method `StringRef::data()` which directly converts a 
> > `StringRef` to a `const char *`. Your two-step conversion has the advantage 
> > that it adds a `\0` terminator even if the `StringRef` isn't 
> > null-terminated, but I cannot imagine a "natural" code change that would 
> > introduce references to non-null-terminated char arrays as note message 
> > templates.
> I would prefer not to rely on that `StringRef`s (here) are expected to be 
> null-terminated, unless benchmarks demonstrate that this is important. If 
> that would turn out to be the case, then we would need to enforce this using 
> some sort of `assert` expression.
I think the way of expressing this concrete conversion is a very low priority 
question (a.k.a. bikeshedding), so returning to the previously used two-step 
conversion is also fine for me.

The primary goal of this ".data()" shortcut was not performance, but clarity: 
those chained conversions triggered my "detect suspicious code" instincts and 
this was the best alternative that I could find. In general I feel that the 
LLVM codebase has way too many string types, and the back-and-forth conversions 
between them add lots of unnecessary noise to the code.

In this particular case I think it's the fault of `llvm::formatv` that it 
doesn't accept `StringRef` (the most commonly used non-owning string) and in 
general it's problematic design that there is no "good" conversion between 
`StringRef` and C-style strings in either direction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154423: [clang][analyzer] Add all success/failure messages to StdLibraryFunctionsChecker.

2023-07-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D154423#4485009 , @balazske wrote:

> It would be more simple to handle the standard streams in `StreamChecker` 
> only. [...]

That's a reasonable plan, especially if we can bring that checker out of alpha 
in the foreseeable future. I like that it avoids code duplication.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154423/new/

https://reviews.llvm.org/D154423

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153776: [clang][analyzer] Display notes in StdLibraryFunctionsChecker only if interesting

2023-07-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153776/new/

https://reviews.llvm.org/D153776

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154423: [clang][analyzer] Add all success/failure messages to StdLibraryFunctionsChecker.

2023-07-10 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

Ok, then I think you can finally merge this commit and its two predecessors. 
I'm happy to see that this improvement is complete.

I hope that you can later add a special case for the standard streams, but I 
agree that it should be a separate commit.

Thanks for clarifying the `ftell` situation, that's clearly an engine issue 
that should not block this checker improvement. However, I'd be really glad to 
see a cleanup of the interestingness system (as a separate project), because 
its inconsistency leads to lots of very confusing bug reports.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154423/new/

https://reviews.llvm.org/D154423

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154423: [clang][analyzer] Add all success/failure messages to StdLibraryFunctionsChecker.

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I looked through most of the open source results, and they look promising.

However I've seen one questionable tendency: there are many reports (e.g. this 
one 
)
 where the checker assumes that `fileno(stdin)`, `fileno(stdout)` or 
`fileno(stderr)` fails. How realistic are these assumptions? Do we need to 
bother the programmer with these or should/can we silence them (and perhaps 
return the known fileno values corresponding to these standard streams)?

Another, concrete issue is this report 

 where the analyzer assumes that `ftell` returns `-1` (that gets converted to 
2**64-1 because unsigned numbers are crazy), but there is no note tag (not even 
in the _3 run) and I don't see where does this assumption come from (although I 
didn't do a deep analysis).

Apart from these, the false positives are coming from general limitations of 
the engine that cannot be reasonably avoided.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154423/new/

https://reviews.llvm.org/D154423

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154509: [clang][analyzer] StdLibraryFunctionsChecker: Allow NULL buffer in `fread` and `fwrite` if size is zero.

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

By the way here is a concrete example where this patch is needed to squash a 
false positive: 
https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_stdclf_notetag_interesting_test_3=on=49939ae1896dff1ad782092b8534bd68=%2arelcache.c=1942889


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154509/new/

https://reviews.llvm.org/D154509

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153954: [WIP][clang][analyzer] Relax alpha.cplusplus.EnumCastOutOfRange checker

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D153954#4480296 , @steakhal wrote:

> [...] I'd not recommend moving the actual implementation anywhere.

I agree, I mostly spoke figuratively, suggesting that this checker could end up 
in the optin group when it's eventually brought out of alpha.




Comment at: clang/test/Analysis/enum-cast-out-of-range.cpp:44
+// Suppress unused warnings
+[](...){}(unscoped, scoped);
+  }

steakhal wrote:
> `void sink(...);` would have achieved the same and I'd say it's less complex.
> `sink(unscoped, scoped);`
Or just `(void)unscoped; (void)scoped;` if we're bikeshedding this test...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153954/new/

https://reviews.llvm.org/D153954

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153954: [WIP][clang][analyzer] Relax alpha.cplusplus.EnumCastOutOfRange checker

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I think the old "report when the value stored in an enum type is not equal to 
one of the enumerators" behavior would be useful as an opt-in checker that 
could be adopted by some projects; while the new "relaxed" version is too bland 
to be really useful. I'd suggest keeping the old behavior in the general case, 
eliminating the "obvious" false positives like `std::byte` (don't report types 
that have no enumerators) and moving this checker towards the optin group.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153954/new/

https://reviews.llvm.org/D153954

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154509: [clang][analyzer] StdLibraryFunctionsChecker: Allow NULL buffer in `fread` and `fwrite` if size is zero.

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM.




Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:413-414
+using ValueConstraint::ValueConstraint;
+ArgNo SizeArg1N;
+std::optional SizeArg2N;
+// This variable has a role when we negate the constraint.

What would you think about a `SmallVector<2>` for these? It would allow you to 
handle them with a (short) for loop instead of separate commands.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154509/new/

https://reviews.llvm.org/D154509

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154603: [analyzer][clangsa] Add new option to alpha.security.cert.InvalidPtrChecker

2023-07-07 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

The commit looks good in general, I have a few minor suggestions and a serious 
question about the state transition bureaucracy.




Comment at: clang/lib/StaticAnalyzer/Checkers/cert/InvalidPtrChecker.cpp:65-66
+  // If set to true, consider getenv calls as invalidating operations on the
+  // environment variable buffer. This is implied in the standard, but can lead
+  // to practical false positives.
+  bool InvalidatingGetEnv = false;

Nitpick: "practical false positives" sounds strange for me, consider writing
[...but] "in practice does not cause problems (in the commonly used 
environments)"
or something similar.



Comment at: clang/lib/StaticAnalyzer/Checkers/cert/InvalidPtrChecker.cpp:125
 
-  const NoteTag *Note =
-  C.getNoteTag([SymbolicEnvPtrRegion, FunctionName](
-   PathSensitiveBugReport , llvm::raw_ostream ) {
-if (!BR.isInteresting(SymbolicEnvPtrRegion))
-  return;
-Out << '\'' << FunctionName
-<< "' call may invalidate the environment parameter of 'main'";
-  });
+  ExplodedNode *CurrentChainEnd = nullptr;
+

Perhaps add a comment that clarifies that passing a `nullptr` as the 
ExplodedNode to `addTransition` is equivalent to specifying the current node. I 
remember this because I was studying its implementation recently, but I 
would've been surprised and suspicious otherwise.



Comment at: clang/lib/StaticAnalyzer/Checkers/cert/InvalidPtrChecker.cpp:218
+State->add(Call.getReturnValue().getAsRegion());
+   C.addTransition(State);
+  }

I fear that this state transition will go "sideways" and the later state 
transitions (which add the note tags) will branch off instead of building onto 
this. IIUC calling `CheckerContext::addTransition` registers the transition 
without updating the "current ExplodedNode" field of `CheckerContext`, so you 
need to explicitly store and pass around the ExplodedNode returned by it if you 
want to build on it.

This is an ugly and counter-intuitive API, and I also ran into a very similar 
issue a few weeks ago (@Szelethus helped me).



Comment at: clang/test/Analysis/cert/env34-c.c:6
+//
+// TODO: write test cases that follow the pattern:
+//   "getenv -> store pointer -> setenv -> use stored pointer"

gamesh411 wrote:
> This test file is incomplete.
> I would welcome suggestions here as to how to test this.
> Should a new file be created for the config option with different test cases, 
> or is this file to be extended?
Personally I'd prefer putting those cases into a separate files, because this 
test file is already 340 lines long and it'd be difficult to understand if it 
was filled with conditional checks. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154603/new/

https://reviews.llvm.org/D154603

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154423: [clang][analyzer] Add all success/failure messages to StdLibraryFunctionsChecker.

2023-07-05 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

These code changes seem to be promising. Please upload results on the open 
source projects (like the ones that you uploaded previously), and I hope that 
after verifying those I we can finally merge this set of commits.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154423/new/

https://reviews.llvm.org/D154423

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154325: [analyzer][NFC] Move away from using raw-for loops inside StaticAnalyzer

2023-07-04 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I don't have concrete remarks, but I would like to note that I'm happy to see 
this cleanup :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154325/new/

https://reviews.llvm.org/D154325

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153776: [clang][analyzer] Display notes in StdLibraryFunctionsChecker only if interesting

2023-07-03 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

In D153776#4467627 , @balazske wrote:

> The success/failure note tags are not added to all functions, `dup` is one of 
> these, this means at `dup` no note tag is shown. A next patch can be made to 
> add these messages to all functions. The other places look good, but 
> CodeChecker is a bit tricky, you must select 
> //*_stdclf_notetag_interesting_test_2// at the small arrow after the "found 
> in:" text (upper right corner). The link is good but not that instance of the 
> bug is displayed because only the note tags are different.

I think we should definitely add success/failure note tags to all functions 
where this checker can suddenly assume that "Oops, this failed". If this 
"Assuming that 'foo' fails" message is not there, it's a very natural reaction 
to search for some earlier note that would let the checker deduce that this 
function will fail here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153776/new/

https://reviews.llvm.org/D153776

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

See results and their discussion on the review of the tightly connected 
follow-up commit D153776 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153776: [clang][analyzer] Display notes in StdLibraryFunctionsChecker only if interesting

2023-06-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy requested changes to this revision.
donat.nagy added a comment.
This revision now requires changes to proceed.

Thanks for attaching the test results! Unfortunately it seems that it was 
important to look at them, because I noticed that in the posgresql codebase 
there are many bug reports on calls like `func1(func2(...), ...)` where the 
checker does not print the expected "Assuming that 'func2' fails" note: 
fdopen/dup 
,
 fstat/fileno 
,
 isatty/fileno 
,
 another isatty/fileno 
,
 a third isatty/fileno 
,
 a second dup/fileno 

 (this is not an exhaustive list, but I think I listed the majority of them).

Please investigate these before merging your commits.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153776/new/

https://reviews.llvm.org/D153776

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153889: [analyzer][NFC] Fix dangling StringRef in barely used code

2023-06-30 Thread Donát Nagy via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG1d75b18843fb: [analyzer][NFC] Fix dangling StringRef in 
barely used code (authored by donat.nagy).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153889/new/

https://reviews.llvm.org/D153889

Files:
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h


Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
===
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
@@ -316,8 +316,8 @@
   ///bug path significantly shorter.
   const NoteTag *getNoteTag(StringRef Note, bool IsPrunable = false) {
 return getNoteTag(
-[Note](BugReporterContext &,
-   PathSensitiveBugReport &) { return std::string(Note); },
+[Note = std::string(Note)](BugReporterContext &,
+   PathSensitiveBugReport &) { return Note; },
 IsPrunable);
   }
 


Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
===
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
@@ -316,8 +316,8 @@
   ///bug path significantly shorter.
   const NoteTag *getNoteTag(StringRef Note, bool IsPrunable = false) {
 return getNoteTag(
-[Note](BugReporterContext &,
-   PathSensitiveBugReport &) { return std::string(Note); },
+[Note = std::string(Note)](BugReporterContext &,
+   PathSensitiveBugReport &) { return Note; },
 IsPrunable);
   }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153674: [dataflow] Disallow implicit copy of Environment, use fork() instead

2023-06-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

When I tried to compile with gcc 7.5 after your fix commit (2f7d30dee826 
), I got 
another very similar compilation error from a different location and I pushed 
commit cec30e2b190b 
 to fix 
that. After this second correction I can successfully run `ninja 
check-clang-analysis` with gcc7.5.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153674/new/

https://reviews.llvm.org/D153674

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153674: [dataflow] Disallow implicit copy of Environment, use fork() instead

2023-06-30 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Thank you for the fix :) !


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153674/new/

https://reviews.llvm.org/D153674

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153776: [clang][analyzer] Display notes in StdLibraryFunctionsChecker only if interesting

2023-06-29 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM if you rebase these changes onto the most recent version of D153612 
. Thanks for adding this interestingness 
check!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153776/new/

https://reviews.llvm.org/D153776

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-29 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM, but please wait until you can merge this together with D153776 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-29 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

Thanks for the update! I have two minor remarks (string handling is complicated 
in C++...) but the code looks good otherwise.




Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1305
+  std::string Note =
+  llvm::formatv(Case.getNote().str().c_str(),
+cast(Call.getDecl())->getDeclName());

Consider using the method `StringRef::data()` which directly converts a 
`StringRef` to a `const char *`. Your two-step conversion has the advantage 
that it adds a `\0` terminator even if the `StringRef` isn't null-terminated, 
but I cannot imagine a "natural" code change that would introduce references to 
non-null-terminated char arrays as note message templates.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1321
+  if (Node->succ_size() > 1)
+return Note.c_str();
+  return "";

The return type of this lambda is `std::string`, so I think it's pointless to 
convert `std::string Note` to a C string (which will be used to implicitly 
construct another `std::string`). 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153363: [clang][analyzer] No end-of-file when seek to file begin.

2023-06-29 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy accepted this revision.
donat.nagy added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153363/new/

https://reviews.llvm.org/D153363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153674: [dataflow] Disallow implicit copy of Environment, use fork() instead

2023-06-29 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

I'm also using `gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0` (which supported 
according to the LLVM "Getting Started" page 
)
 and I see the same std::optional compilation errors. Please revert this commit 
or fix the build.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153674/new/

https://reviews.llvm.org/D153674

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy requested changes to this revision.
donat.nagy added a comment.
This revision now requires changes to proceed.

As I started to review the code of the follow-up commit D153776 
, I noticed a dangling `StringRef` bug which 
belongs to this review.

Moreover, as a minor remark I'd note that LLVM has a dedicated class for 
storing string literal constants (btw I learned this from @Szelethus yesterday).




Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1303
+  // use it as a note tag.
+  StringRef Note = Case.getNote();
+  if (Summary.getInvalidationKd() == EvalCallAsPure) {

This variable will be captured and stored by the lambda functions, so it should 
own the memory area of  the string.

In the current code this did not cause problems because this StringRef points 
to the memory area of a string literal (which stays valid); but if later 
changes introduced dynamically generated note tags, then this code would dump 
random memory garbage.

By the way, a very similar issue survived almost four years in 
CheckerContext.h, I'm fixing it in D153889.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1318
+  if (Node->succ_size() > 1)
+return Note.str();
+  return "";

Also update this when you change the type of `Note` to `std::string`.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1687-1688
 
+  const char *GenericSuccessMsg = "Assuming that the call is successful";
+  const char *GenericFailureMsg = "Assuming that the call fails";
+

Consider using the `StringLiteral` subclass of `StringRef` which is designed 
for this kind of application (and determines the length of the string in 
compile time instead of calling `strlen` during a runtime `const char * → 
StringRef` conversion):
https://llvm.org/doxygen/classllvm_1_1StringLiteral.html
(After the suggested change, also update the code that uses these variables!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-28 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

The source code changes LGTM. I'll soon check the results on the open source 
projects and give a final approval based on that.

By the way, I think this commit and the "display notes only if interesting" 
follow-up change (D153776 ) should be merged 
at the same time (but I'd guess that you already planned to do it that way).




Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1299
+// StdLibraryFunctionsChecker.
+ExplodedNode *Pred = const_cast(Node);
+if (!Case.getNote().empty()) {

balazske wrote:
> donat.nagy wrote:
> > balazske wrote:
> > > donat.nagy wrote:
> > > > Can you explain why is it safe to use `const_cast` here? (I don't see 
> > > > any concrete issue, but the engine has lots of invariants / unwritten 
> > > > rules and I fear that this might break one of them.)
> > > The node `Pred` should be modified only later when a successor is added 
> > > (`addTransition` has non-const parameter).
> > I understood that you //need// a non-const `ExplodedNode *` because 
> > `addTransition` expects it; I want to understand why you are //allowed to// 
> > `const_cast` it (why doesn't this confuse the engine logic).
> > 
> > Equivalent question from the other direction: Why did the author of 
> > `CheckerContext::getPredecessor()` specify that its return value is a 
> > //const// pointer to `ExplodedNode`?
> > 
> > If we can conclude that `const_cast` is valid in this kind of situation, 
> > then I'd also consider simply removing the "const" from the return type of 
> > `getPredecessor`.
> The `const_cast` is not needed at all if `Pred` and `Node` is made non-const, 
> and `getPredecessor` has a non-const version. The `Node` is saved because we 
> want to add transitions to it, it makes no sense to have it (a pointer to) 
> const. (Probably the const comes from a time when the `Node` was used only 
> for the lambda? In the lambda it could be const, if it matters.)
Sorry, it seems that I badly misread the source code of CheckerContext.h (I 
probably looked at the wrong line when I briefly jumped to the definition of 
`getPredecessor`). In fact, `getPredecessor` has only one version and it 
returns non-const `ExplodedNode *`.

This means that your code is (of course) completely valid.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153889: [analyzer][NFC] Fix dangling StringRef in barely used code

2023-06-27 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy created this revision.
donat.nagy added reviewers: Szelethus, gamesh411, steakhal.
Herald added subscribers: manas, ASDenysPetrov, dkrupp, mikhail.ramalho, 
a.sidorin, szepet, baloghadamsoftware, xazax.hun.
Herald added a reviewer: NoQ.
Herald added a project: All.
donat.nagy requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

`CheckerContext::getNoteTag` has a shorthand version that takes a plain 
'`StringRef Note`' instead of a lambda that calculates the note.

The old implementation of this method was incorrect because it created a lambda 
that captured the StringRef, which was dereferenced much later, when the 
NoteTags were visited.

In the current codebase this does not cause errors because this method is 
called only once, and there the `Note` argument is a string literal that 
remains valid. However, I tried to use this method in a checker that I was 
prototyping, and there it printed random memory junk (instead of the message 
that I composed in a local variable).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D153889

Files:
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h


Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
===
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
@@ -316,8 +316,8 @@
   ///bug path significantly shorter.
   const NoteTag *getNoteTag(StringRef Note, bool IsPrunable = false) {
 return getNoteTag(
-[Note](BugReporterContext &,
-   PathSensitiveBugReport &) { return std::string(Note); },
+[Note = std::string(Note)](BugReporterContext &,
+   PathSensitiveBugReport &) { return Note; },
 IsPrunable);
   }
 


Index: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
===
--- clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
+++ clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
@@ -316,8 +316,8 @@
   ///bug path significantly shorter.
   const NoteTag *getNoteTag(StringRef Note, bool IsPrunable = false) {
 return getNoteTag(
-[Note](BugReporterContext &,
-   PathSensitiveBugReport &) { return std::string(Note); },
+[Note = std::string(Note)](BugReporterContext &,
+   PathSensitiveBugReport &) { return Note; },
 IsPrunable);
   }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-27 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1299
+// StdLibraryFunctionsChecker.
+ExplodedNode *Pred = const_cast(Node);
+if (!Case.getNote().empty()) {

balazske wrote:
> donat.nagy wrote:
> > Can you explain why is it safe to use `const_cast` here? (I don't see any 
> > concrete issue, but the engine has lots of invariants / unwritten rules and 
> > I fear that this might break one of them.)
> The node `Pred` should be modified only later when a successor is added 
> (`addTransition` has non-const parameter).
I understood that you //need// a non-const `ExplodedNode *` because 
`addTransition` expects it; I want to understand why you are //allowed to// 
`const_cast` it (why doesn't this confuse the engine logic).

Equivalent question from the other direction: Why did the author of 
`CheckerContext::getPredecessor()` specify that its return value is a //const// 
pointer to `ExplodedNode`?

If we can conclude that `const_cast` is valid in this kind of situation, then 
I'd also consider simply removing the "const" from the return type of 
`getPredecessor`.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1309
+[Node, Note](PathSensitiveBugReport ) -> std::string {
+  if (Node->succ_size() > 1)
+return Note.str();

balazske wrote:
> donat.nagy wrote:
> > It's surprising to see this check inside the lambda, as its result does not 
> > depend on `BR`. My best guess is that it's performed here because the 
> > successors of `Node` will appear between the execution of the surrounding 
> > code and the execution of this lambda.
> > 
> > However, CheckerContext.h line 69-70 claims that "checkers should not 
> > retain the node in their state since the nodes might get invalidated." 
> > which would imply that the captured `Node` might be invalid when the lambda 
> > is called.
> This check is to decide if multiple cases could be applied, the same as if we 
> count how many times this place in the loop is executed (add a transition for 
> a case, constraints could be applied). This check is problematic because 
> other checkers can apply state splits before this checker is executed or 
> after it, in this way `StreamChecker` interferes with this code (it has a 
> state split for success/failure cases of same function, and here we see only 
> that a single case is applied on one branch). This is why this check is only 
> used in the `EvalCallAsPure` case (theoretically still another checker can 
> make a state split in PostCall before this where the same constraint is 
> applied, then the problem occurs again).
> 
> I made a solution that does not have this check but has 2 case loops instead, 
> but the mentioned problem (which exists when `if (Summary.getInvalidationKd() 
> == EvalCallAsPure)` is not used) did not go away. And it may not work to 
> search backwards for the first node with the same statement, because maybe 
> not the first one is where a state split is done.
> 
> I only think that if this lambda is called with the saved node, that node is 
> not invalid because it is part of a bug report call sequence.
> This check is problematic because [...details...]

Thanks for the explanation, now I see why this roundabout solution is needed.

> I only think that if this lambda is called with the saved node, that node is 
> not invalid because it is part of a bug report call sequence.

This is a good point, I think we can keep this "check successors in the lambda" 
solution. Please add a comment like "This check is performed inside the lambda, 
after other checkers had a chance to add other successors. Dereferencing the 
saved node object is valid because it's part of a bug report call sequence." to 
record the the reasoning that we discussed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-26 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added inline comments.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1309
+[Node, Note](PathSensitiveBugReport ) -> std::string {
+  if (Node->succ_size() > 1)
+return Note.str();

It's surprising to see this check inside the lambda, as its result does not 
depend on `BR`. My best guess is that it's performed here because the 
successors of `Node` will appear between the execution of the surrounding code 
and the execution of this lambda.

However, CheckerContext.h line 69-70 claims that "checkers should not retain 
the node in their state since the nodes might get invalidated." which would 
imply that the captured `Node` might be invalid when the lambda is called.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153612: [clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.

2023-06-26 Thread Donát Nagy via Phabricator via cfe-commits
donat.nagy added a comment.

This is a good and useful commit; but I have some questions connected to the 
state transition handling code that you had to modify. (The State/ExplodedNode 
APIs of the engine are really counter-intuitive...  :( )




Comment at: clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp:316-319
+  C, (Twine("'errno'") +
+  Twine(" may be undefined after successful call to '") + Twine(Fn) +
+  Twine("'"))
  .str());

Consider using `llvm:formatv()` here and in other similar messages, it is much 
less verbose.



Comment at: 
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:1299
+// StdLibraryFunctionsChecker.
+ExplodedNode *Pred = const_cast(Node);
+if (!Case.getNote().empty()) {

Can you explain why is it safe to use `const_cast` here? (I don't see any 
concrete issue, but the engine has lots of invariants / unwritten rules and I 
fear that this might break one of them.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153612/new/

https://reviews.llvm.org/D153612

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >