[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-19 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp closed https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-18 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

I hope in the future we will have a more satisfying solution.
LGTM.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-18 Thread Daniel Krupp via cfe-commits

dkrupp wrote:

> As I'm not a maintainer, I could not push to your branch. Here is a patch 
> that I think has the missing pieces to satisfy my review. 
> [0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt](https://github.com/llvm/llvm-project/files/12645128/0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt)
>  Apply it to your branch by `git am 
> 0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt`. After 
> that, I'm okay to squash merge this PR, if you are also okay with my 
> suggestions.

Thanks for the suggestions. I squashed it.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-18 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp updated 
https://github.com/llvm/llvm-project/pull/66086

>From 889c886c3eed31335531ec61ad2b48bef15414d8 Mon Sep 17 00:00:00 2001
From: Daniel Krupp 
Date: Fri, 8 Sep 2023 16:57:49 +0200
Subject: [PATCH] [analyzer] TaintPropagation checker strlen() should not
 propagate

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings.
It is a common pattern to copy user provided input
to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when
the attacker can directly specify the size of the allocated area
as an arbitrary large number (e.g. the value is converted
from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function
with the consideration not to emit warnings when the tainted value
cannot be "arbitrarily large" (such as the size of an already allocated
memory block).
---
 clang/docs/ReleaseNotes.rst   |  7 +
 clang/docs/analyzer/checkers.rst  |  4 +--
 .../Checkers/GenericTaintChecker.cpp  |  7 ++---
 .../test/Analysis/taint-diagnostic-visitor.c  | 13 +-
 clang/test/Analysis/taint-generic.c   | 26 +--
 5 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 3cdad2f7b9f0e5a..414cd7f62e2d764 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -409,6 +409,13 @@ Static Analyzer
   bitwise shift operators produce undefined behavior (because some operand is
   negative or too large).
 
+- The ``alpha.security.taint.TaintPropagation`` checker no longer propagates
+  taint on ``strlen`` and ``strnlen`` calls, unless these are marked
+  explicitly propagators in the user-provided taint configuration file.
+  This removal empirically reduces the number of false positive reports.
+  Read the PR for the details.
+  (`#66086 `_)
+
 .. _release-notes-sanitizers:
 
 Sanitizers
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 5da0f34b3d0464f..dae8ff0c5c8f1b8 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -695,9 +695,10 @@ void GenericTaintChecker::initTaintRules(CheckerContext 
) const {
   {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"wcslen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+
+  // strlen, wcslen, strnlen and alike intentionally don't propagate taint.
+  // See the details here: https://github.com/llvm/llvm-project/pull/66086
+
   {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
   {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
   {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c 
b/clang/test/Analysis/taint-diagnostic-visitor.c
index f1b9ceebdd9a6b8..8a7510177f3e444 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...);
 int system(const char *command);
 char* getenv( const char* env_var );
 size_t strlen( const char* str );
+int atoi( const char* str );
 void *malloc(size_t size );
 void free( void *ptr );
 char *fgets(char *str, int n, FILE *stream);
@@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) {
 // propagating through variables and expressions
 char *taintDiagnosticPropagation(){
   char *pathbuf;
-  char *pathlist=getenv("PATH"); // expected-note {{Taint 

[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-18 Thread Balazs Benics via cfe-commits

steakhal wrote:

As I'm not a maintainer, I could not push to your branch.
Here is a patch that I think has the missing pieces to satisfy my review.
[0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt](https://github.com/llvm/llvm-project/files/12645128/0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt)
Apply it to your branch by `git am 
0001-fixup-analyzer-TaintPropagation-checker-strlen-shoul.patch.txt`.
After that, I'm okay to squash merge this PR, if you are also okay with my 
suggestions.


https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-16 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp updated 
https://github.com/llvm/llvm-project/pull/66086

>From f8997b16c74543eb57b272c4dd4abca1a10d9ac7 Mon Sep 17 00:00:00 2001
From: Daniel Krupp 
Date: Fri, 8 Sep 2023 16:57:49 +0200
Subject: [PATCH] [analyzer] TaintPropagation checker strlen() should not
 propagate

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings.
It is a common pattern to copy user provided input
to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when
the attacker can directly specify the size of the allocated area
as an arbitrary large number (e.g. the value is converted
from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function
with the consideration not to emit warnings when the tainted value
cannot be "arbitrarily large" (such as the size of an already allocated
memory block).
---
 clang/docs/analyzer/checkers.rst|  4 ++--
 .../StaticAnalyzer/Checkers/GenericTaintChecker.cpp |  3 ---
 clang/test/Analysis/taint-diagnostic-visitor.c  | 13 +++--
 clang/test/Analysis/taint-generic.c | 10 +-
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 5da0f34b3d0464f..95a759c251ca490 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -695,9 +695,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext ) 
const {
   {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"wcslen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
   {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
   {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c 
b/clang/test/Analysis/taint-diagnostic-visitor.c
index f1b9ceebdd9a6b8..8a7510177f3e444 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...);
 int system(const char *command);
 char* getenv( const char* env_var );
 size_t strlen( const char* str );
+int atoi( const char* str );
 void *malloc(size_t size );
 void free( void *ptr );
 char *fgets(char *str, int n, FILE *stream);
@@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) {
 // propagating through variables and expressions
 char *taintDiagnosticPropagation(){
   char *pathbuf;
-  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
  // expected-note@-1 {{Taint propagated to the 
return value}}
-  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+  if (size){ // expected-note {{Assuming 'size' is non-null}}
   // expected-note@-1 {{Taking true branch}}
-pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted 
data is used to specify the buffer size}}
+pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is 
used to specify the buffer size}}
 // expected-note@-1{{Untrusted 
data is used to specify the buffer size}}
 // expected-note@-2 {{Taint 
propagated to the return value}}
 return pathbuf;
@@ -71,12 +72,12 @@ char *taintDiagnosticPropagation(){
 char *taintDiagnosticPropagation2(){
   

[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

steakhal wrote:

Request another round of review once you are happy with the content and 
addressed the open comments.
On the grand scheme we are aligned.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

steakhal wrote:

> Putting an upper bound on `strlen` is not just for `malloc`, it's also needed 
> for ArrayBoundV2.
> 
> As a very clear example, this [function `strfuzz_ends_with` from 
> twin](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline=twin_v0.8.1_ednikru_taint_nostrlen_new=on=Resolved=2648820=637b598311521fbe5ec9b657174d2862=%2arcopt.c)
>  functions iterates backwards on two strings and runs into an "Out of bounds 
> memory access (index is tainted)" false positive when it tries to access the 
> last character of a nonempty string.
> There are also other more complicated false positives from 
> [vim](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved=2649310=79dc8522d2cd34ca8e1b2dc2db64b2df=%2aos_unix.c)
>  and 
> [postgres](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline=postgres_REL_13_0_ednikru_taint_nostrlen_new=on=Resolved=2658495=bd04e4bd17058bb6c64a601f3dcfc23b=%2afe-protocol3.c).

Yes, I've also seen similar cases around the result of `getenv`. I believe we 
also discussed this in [D159105](https://reviews.llvm.org/D159105). 
The important thing here is that they are not that frequent and pretty easy to 
conclude if it's a TP or FP. 

> Based on these I'd say that propagating taint in `strlen` is not acceptable 
> without providing upper bounds. (These FPs are not extraordinarily common, 
> but it's simply _too ugly_ when the analyzer says that accessing the last 
> character of a string may be out of bounds access.)

Let's see how it works out in practice. I won't object to this change.

Do you also plan to partially revert `wcslen` in 
61924da630532c91f00351b7e84548eb42e2e1e0 to match the `strlen` behavior?

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread via cfe-commits

DonatNagyE wrote:

Putting an upper bound on `strlen` is not just for `malloc`, it's also needed 
for ArrayBoundV2.

As a very clear example, this [function `strfuzz_ends_with` from 
twin](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline=twin_v0.8.1_ednikru_taint_nostrlen_new=on=Resolved=2648820=637b598311521fbe5ec9b657174d2862=%2arcopt.c)
 functions iterates backwards on two strings and runs into an "Out of bounds 
memory access (index is tainted)" false positive when it tries to access the 
last character of a nonempty string.

There are also other more complicated false positives from 
[vim](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved=2649310=79dc8522d2cd34ca8e1b2dc2db64b2df=%2aos_unix.c)
 and 
[postgres](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline=postgres_REL_13_0_ednikru_taint_nostrlen_new=on=Resolved=2658495=bd04e4bd17058bb6c64a601f3dcfc23b=%2afe-protocol3.c).

Based on these I'd say that propagating taint in `strlen` is not acceptable 
without providing upper bounds. (These FPs are not extraordinarily common, but 
it's simply _too ugly_ when the analyzer says that accessing the last character 
of a string may be out of bounds access.)

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

steakhal wrote:

> So for me either solution would work:
> a) remove strlen() as a propagator and note it in the checker doc
> b) remove malloc() as a sink and note it in the checker doc
> c) don't do anything and live with the false positives

TBH I would prefer (b). I see removing the whole `MsgTaintedBufferSize` is a 
bit too wild. And `alloca` with unconstrained tainted size should be still 
detected.
(c) is definitely not an option, assuming that there were honest motivations 
behind this patch.



> By the way, could you show an OOBV2 true positive that involves taint 
> propagated by `strlen` call? My impression was that the "index is tainted" 
> errors that involve `strlen` are typically false positives where the analyzer 
> cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also 
> valid when the string is nonempty.

I'm sure I've seen it in the Juliet suite. I believe something similar must 
have been the motivation for me proposing `wcslen` as a taint propagator 
because it was missing from the taint-dataflow. Since the test cases are 
generated, and there are `char` counterparts, I'd bet there would be a missing 
taint-flow there as well, if we didn't propagate on `strlen`.
I also understand that the Juliet is an artificial benchmark, that might or 
might not represent real-world scenarios.



> > Consequently, I agree with the raised problems, but I disagree with the 
> > approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to 
> > resolve those FPs. Alternatively, we can also think of creating a heuristic 
> > to reduce such FPs. For e.g. check if the most significant bit of the 
> > allocation size is proven to be unset (aka. checked some meaningful 
> > upperbounds) and suppress reports in that case, but report otherwise. Would 
> > it be okay with you to proceed by not removing taint propagation?
> 
> I agree that `MsgTaintedBufferSize` reports should be limited to cases where 
> the tainted value _can be large_ and I would suggest a threshold that's 
> significantly lower than the SIZE_MAX/2 implied by your "most significant 
> bit" suggestion (ideally the threshold value would be configurable). However, 
> I think this eliminates the false positives related to `strlen` only if we 
> also add some logic that can reliably "tag" the return value of `strlen` with 
> reasonable upper bounds.

Yea, upperbounding `strlen` result I'm not sure if it really makes sense, just 
to fulfill the heuristic imposed on `malloc`.

Let's discuss then how much benefit warning on tainted malloc allocations 
provide.
If it turns out the tainted malloc does not really work, we should just remove 
that from sinks.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread via cfe-commits

DonatNagyE wrote:

> By looking at the disappeared reports you attached, I'm convinced that the 
> `MsgTaintedBufferSize` diagnostics give little to no benefit in general. On 
> the other side, I've seen good hits for OOBV2 in the presence of taint - even 
> if that's rarely the case. On the theory side, I also believe that 
> propagation should happen on `strlen` and similar functions.

You're right that theoretically speaking the `strlen`-like functions propagate 
taint, but their return value is practically always constrained from above (if 
the string has a null terminator, the length is less than its extent in memory) 
and I think that propagating taint without considering these constraint is 
worse than not propagating taint.

By the way, could you show an OOBV2 true positive that involves taint 
propagated by `strlen` call? My impression was that the "index is tainted" 
errors that involve `strlen` are typically false positives where the analyzer 
cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also 
valid when the string is nonempty.

> Consequently, I agree with the raised problems, but I disagree with the 
> approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to 
> resolve those FPs. Alternatively, we can also think of creating a heuristic 
> to reduce such FPs. For e.g. check if the most significant bit of the 
> allocation size is proven to be unset (aka. checked some meaningful 
> upperbounds) and suppress reports in that case, but report otherwise. Would 
> it be okay with you to proceed by not removing taint propagation?

I agree that `MsgTaintedBufferSize` reports should be limited to cases where 
the tainted value _can be large_ and I would suggest a threshold that's 
significantly lower than the SIZE_MAX/2 implied by your "most significant bit" 
suggestion (ideally the threshold value would be configurable). However, I 
think this eliminates the false positives related to `strlen` only if we also 
add some logic that can reliably "tag" the return value of `strlen` with 
reasonable upper bounds.

I'd be happy to accept a patch that (1) ensures that `strlen` propagates taint 
(2) reliably puts upper bounds on the result of `strlen` and (3) limits 
`MsgTaintedBufferSize` to reporting cases where the tainted value can be large. 
However, as long as we don't have this refined code, I think the best temporary 
solution is removing taint propagation from `strlen` as the untainted return 
value is a good practical approximation for a tainted value with strong upper 
bounds.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Daniel Krupp via cfe-commits

dkrupp wrote:

If we remove the malloc(..) as the taint sink, we would lose some true positive 
findings where the size of the allocated
area is specified directly as a number by the attacker:
```
char *size=getenv("SIZE");
if (size){
pathbuf=(char*) malloc(atoi(size)+1); // warn: denial of service attack!
...
}
```
The above example is prone to denial of service attack as the the attacker just 
specifies an arbitrarily large number to which a buffer will be allocated. The 
attacker needs much less resources to specify a large number than the recevier 
to allocate a large chuck of memory.

On the other hand when we have a code like this:
```
char *user_txt=getenv("SIZE");
if (user_txt){ 
pathbuf=(char*) malloc(strlen(user_txt)+1); // No Warning as the malloc 
parameter comes from the size of an already allocated buffer
...
}
```
Here we should not warn as the size passed to malloc is the size of an already 
allocated buffer. So invested resources by the attacker to provide the large 
string and the server allocating another buffer to contain that string is 
symmetrical. So not prone to DoS attack.

A more sophisticated longer term solution could be that we add a flag to the 
taint info (or introduce a taint type) that the tainted value was originating 
from an existing buffer size and then specify the malloc sink so that it should 
not warn in that case. I know we cannot do this know, but the taint analysis 
could be extended into this direction.

Back to this solution.
Please note that this is only the default configuration of the checker.
The user could add the stren as a propagator into the taint config file.
If we decide to remove strlen() as a propagator (as is in this patch) we could 
highlight this in the documentation of the checker that the  user may want to 
add it back.

So for me either solution would work:
a) remove strlen() as a propagator and note it in the checker doc
b) remove malloc() as a sink and note it in the checker doc
c) don't do anything and live with the false positives

Which one would you prefer?

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits


@@ -915,24 +915,6 @@ void testStrndupa(size_t n) {
   clang_analyzer_isTainted_charp(result); // expected-warning {{YES}}
 }
 
-size_t strlen(const char *s);
-void testStrlen() {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strlen(s);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-
-size_t strnlen(const char *s, size_t maxlen);
-void testStrnlen(size_t maxlen) {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strnlen(s, maxlen);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-

steakhal wrote:

In general, I oppose removing FN tests. They are good at documenting intent, if 
for nothing else.
It might be even better to add comments there about why we think it's okay and 
intentional to not propagate taint there. Also, adding a PR link would give the 
possibility to look deeper to understand the why.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-14 Thread Balazs Benics via cfe-commits

steakhal wrote:

I finished the review of this PR.

By looking at the disappeared reports you attached, I'm convinced that the 
`MsgTaintedBufferSize` diagnostics give little to no benefit in general. On the 
other side, I've seen good hits for OOBV2 in the presence of taint - even if 
that's rarely the case. On the theory side, I also believe that propagation 
should happen on `strlen` and similar functions.

Consequently, I agree with the raised problems, but I disagree with the 
approach.
I would rather remove the `MsgTaintedBufferSize` diagnostic to resolve those 
FPs.
Alternatively, we can also think of creating a heuristic to reduce such FPs. 
For e.g. check if the most significant bit of the allocation size is proven to 
be unset (aka. checked some meaningful upperbounds)  and suppress reports in 
that case, but report otherwise.
Would it be okay with you to proceed by not removing taint propagation?

On the same token, I think we should be able to separately enable/disable 
diagnostics on the GenericTaintChecker (including that they should not sink 
execution paths if they are disabled); but that's a different subject.


https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-13 Thread Balazs Benics via cfe-commits

steakhal wrote:

I actually wanted to propose another patch where the wchar variant of strlen 
would propagate taint, BTW. I still plan to do it, we will see when I reach 
that.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-13 Thread Balazs Benics via cfe-commits

steakhal wrote:

I can understand the frustration of the FPs. However, propagating taint there 
is the right thing to do.
To me, the fault is on the diagnostic on the malloc. Those are the cause of the 
FPs, thus that needs to be removed instead of the propagation.
I have this opinion even if the empirical results suggest that this would 
improve the perceived accuracy of the analysis. But to me, we would just mask 
the root cause.

I haven't looked the the content of the patch (yet), neither the diff's. I'll 
try to have a deeper look tomorrow.
I just wanted to share my concerns, after seeing an approval.

https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-13 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp review_requested 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-13 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp review_requested 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread via cfe-commits

llvmbot wrote:

@llvm/pr-subscribers-clang


Changes

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to 
copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the attacker can 
directly specify the size of the allocated area as an arbitrary large number 
(e.g. the value is converted from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function with the 
consideration not to emit warnings when the tainted value cannot be 
"arbitrarily large" (such as the size of an already allocated memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved)

- tmux: 0 lost reports
- twin [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline=twin_v0.8.1_ednikru_taint_nostrlen_new=on=Resolved)
- vim [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline=sqlite_version-3.33.0_ednikru_taint_nostrlen_new=on=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline=postgres_REL_13_0_ednikru_taint_nostrlen_new=on=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted 
environment variables into another buffer.

There are 2 types of lost false positive reports:
1)  [Where the warning is emitted at the malloc call by the TaintPropagation 
Checker 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved=2648506=2079221954026f17e1ecb614f5f054db=%2amemcached.c)
`
len = strlen(portnumber_filename)+4+1;
temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by the 
ArrayOutofBoundsv2 checker.
For example [this 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved=2649310=79dc8522d2cd34ca8e1b2dc2db64b2df=%2aos_unix.c)case.


--
Full diff: https://github.com/llvm/llvm-project/pull/66086.diff

4 Files Affected:

- (modified) clang/docs/analyzer/checkers.rst (+2-2) 
- (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (-2) 
- (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+7-6) 
- (modified) clang/test/Analysis/taint-generic.c (-18) 



diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 3dcb45c0b110383..9df69c1ad1b525e 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext ) 
const {
   {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strnlen"}}, TR::Prop({{0}}, 

[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread via cfe-commits

llvmbot wrote:

@llvm/pr-subscribers-clang-static-analyzer-1


Changes

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to 
copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the attacker can 
directly specify the size of the allocated area as an arbitrary large number 
(e.g. the value is converted from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function with the 
consideration not to emit warnings when the tainted value cannot be 
"arbitrarily large" (such as the size of an already allocated memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved)

- tmux: 0 lost reports
- twin [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline=twin_v0.8.1_ednikru_taint_nostrlen_new=on=Resolved)
- vim [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline=sqlite_version-3.33.0_ednikru_taint_nostrlen_new=on=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline=postgres_REL_13_0_ednikru_taint_nostrlen_new=on=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted 
environment variables into another buffer.

There are 2 types of lost false positive reports:
1)  [Where the warning is emitted at the malloc call by the TaintPropagation 
Checker 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved=2648506=2079221954026f17e1ecb614f5f054db=%2amemcached.c)
`
len = strlen(portnumber_filename)+4+1;
temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by the 
ArrayOutofBoundsv2 checker.
For example [this 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved=2649310=79dc8522d2cd34ca8e1b2dc2db64b2df=%2aos_unix.c)case.


--
Full diff: https://github.com/llvm/llvm-project/pull/66086.diff

4 Files Affected:

- (modified) clang/docs/analyzer/checkers.rst (+2-2) 
- (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (-2) 
- (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+7-6) 
- (modified) clang/test/Analysis/taint-generic.c (-18) 



diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 3dcb45c0b110383..9df69c1ad1b525e 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext ) 
const {
   {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
   {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strnlen"}}, TR::Prop({{0}}, 

[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread via cfe-commits

https://github.com/llvmbot labeled 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread via cfe-commits

https://github.com/llvmbot labeled 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread via cfe-commits

https://github.com/llvmbot labeled 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp review_requested 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp review_requested 
https://github.com/llvm/llvm-project/pull/66086
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

2023-09-12 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp created 
https://github.com/llvm/llvm-project/pull/66086:

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to 
copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the attacker can 
directly specify the size of the allocated area as an arbitrary large number 
(e.g. the value is converted from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function with the 
consideration not to emit warnings when the tainted value cannot be 
"arbitrarily large" (such as the size of an already allocated memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved)

- tmux: 0 lost reports
- twin [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline=twin_v0.8.1_ednikru_taint_nostrlen_new=on=Resolved)
- vim [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline=sqlite_version-3.33.0_ednikru_taint_nostrlen_new=on=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline=postgres_REL_13_0_ednikru_taint_nostrlen_new=on=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted 
environment variables into another buffer.

There are 2 types of lost false positive reports:
1)  [Where the warning is emitted at the malloc call by the TaintPropagation 
Checker 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline=memcached_1.6.8_ednikru_taint_nostrlen_new=on=Resolved=2648506=2079221954026f17e1ecb614f5f054db=%2amemcached.c)
`
len = strlen(portnumber_filename)+4+1;
temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by the 
ArrayOutofBoundsv2 checker.
For example [this 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline=vim_v8.2.1920_ednikru_taint_nostrlen_new=on=Resolved=2649310=79dc8522d2cd34ca8e1b2dc2db64b2df=%2aos_unix.c)case.



>From 9c7674c39e1b07536f8c57bcdd2b07fb04b4873c Mon Sep 17 00:00:00 2001
From: Daniel Krupp 
Date: Fri, 8 Sep 2023 16:57:49 +0200
Subject: [PATCH] [analyzer] TaintPropagation checker strlen() should not
 propagate

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings.
It is a common pattern to copy user provided input
to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when
the attacker can directly specify the size of the allocated area
as an arbitrary large number (e.g. the value is converted
from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function
with the consideration not to emit warnings when the tainted value
cannot be "arbitrarily large" (such as the size of an already allocated
memory block).
---
 clang/docs/analyzer/checkers.rst   |  4 ++--
 .../Checkers/GenericTaintChecker.cpp   |  2 --
 clang/test/Analysis/taint-diagnostic-visitor.c | 13 +++--
 clang/test/Analysis/taint-generic.c| 18 --
 4 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+