[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-23 Thread via cfe-commits

https://github.com/NagyDonat closed 
https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

> @Xazax-hun WDYT?

If it really takes over an hour to analyze a file in clang-18, I'd agree that 
this has similar severity to a crash, and I support backporting the fix. 

https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread Balazs Benics via cfe-commits

steakhal wrote:

Ah, I think rushed ahead of myself.
I applied the patch to clang-17, where it of course we didn't have any issues 
even with this broken `isTainted`.
Now that I applied the patch to our clang-18 based branch, the file analyzes in 
1:40, which is still far off from the baseline run 32 seconds. But 1:40 is 
arguably a lot better than 1.23 hours.

I perfed again, and it shows this:
![image](https://github.com/llvm/llvm-project/assets/6280485/1706b0c7-bd5e-4de2-88f2-e49d801d0e12)

This is much better, and showcases the next slowdown bug I'm hunting for the 
days since I reported this one :D
The remaining 1 extra minute is lost during Z3 refutation.

So, yea, this PR fixes the bug I reported in #89045, so we can close that once 
this is merged.
And stay tuned for the next slowdown bug. I can already tell you that the 
bitwise shift checker introduces new constraints, after which we can now refute 
more bugs! - but also spends more time for finding a valid bug report in an 
EQClass , while previously we just picked the first one as we found those path 
constraints `sat`. But I'll come back to that once I have something concrete to 
share.

https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

Thanks!

The patch makes sense. Thank you for the promptly fix.
I've checked and this resolves the `FFmpeg` issue.
i wonder if this is serious enough to consider hangs as a crash and nominate 
this PR for backport to `clang-18`.
Leave me no doubt, we will have it downstream for sure, but upstream users 
might wanna also have this.
@Xazax-hun WDYT?

https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread Daniel Krupp via cfe-commits

https://github.com/dkrupp approved this pull request.

The suggested change make a lot of sense. Thanks.
LGTM.

https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (NagyDonat)


Changes

Previously the function
```
std::vectorSymbolRef taint::getTaintedSymbolsImpl(ProgramStateRef State,
const MemRegion *Reg,
TaintTagType K,
bool returnFirstOnly)
```
(one of the 8 overloaded variants under this name) was handling element regions 
in a highly inefficent manner: it performed the "also examine the super-region" 
step twice. (Once in the branch for element regions, and once in the more 
general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass 
of `SubRegion`.)

As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get 
a chain of N nested element regions where this inefficient recursion would 
proudce 2^N calls. I suspect that this issue might be behind 
https://github.com/llvm/llvm-project/issues/89045 (note that `sheervideo.c` 
does very complex pointer arithmetic).

This commit is essentially NFC, apart from the performance improvements and the 
removal of (probably irrelevant) duplicate entries from the return value of 
`getTaintedSymbols()` calls.

---
Full diff: https://github.com/llvm/llvm-project/pull/89606.diff


1 Files Affected:

- (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+6-8) 


``diff
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp 
b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 4edb671753bf45..6362c82b009d72 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -216,21 +216,17 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   std::vector TaintedSymbols;
   if (!Reg)
 return TaintedSymbols;
-  // Element region (array element) is tainted if either the base or the offset
-  // are tainted.
+
+  // Element region (array element) is tainted if the offset is tainted.
   if (const ElementRegion *ER = dyn_cast(Reg)) {
 std::vector TaintedIndex =
 getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly);
 llvm::append_range(TaintedSymbols, TaintedIndex);
 if (returnFirstOnly && !TaintedSymbols.empty())
   return TaintedSymbols; // return early if needed
-std::vector TaintedSuperRegion =
-getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
-llvm::append_range(TaintedSymbols, TaintedSuperRegion);
-if (returnFirstOnly && !TaintedSymbols.empty())
-  return TaintedSymbols; // return early if needed
   }
 
+  // Symbolic region is tainted if the corresponding symbol is tainted.
   if (const SymbolicRegion *SR = dyn_cast(Reg)) {
 std::vector TaintedRegions =
 getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly);
@@ -239,6 +235,8 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   return TaintedSymbols; // return early if needed
   }
 
+  // Any subregion (including Element and Symbolic regions) is tainted if its
+  // super-region is tainted.
   if (const SubRegion *ER = dyn_cast(Reg)) {
 std::vector TaintedSubRegions =
 getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
@@ -318,4 +316,4 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
 }
   }
   return TaintedSymbols;
-}
\ No newline at end of file
+}

``




https://github.com/llvm/llvm-project/pull/89606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix performance of getTaintedSymbolsImpl() (PR #89606)

2024-04-22 Thread via cfe-commits

https://github.com/NagyDonat created 
https://github.com/llvm/llvm-project/pull/89606

Previously the function
```
std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State,
const MemRegion *Reg,
TaintTagType K,
bool returnFirstOnly)
```
(one of the 8 overloaded variants under this name) was handling element regions 
in a highly inefficent manner: it performed the "also examine the super-region" 
step twice. (Once in the branch for element regions, and once in the more 
general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass 
of `SubRegion`.)

As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get 
a chain of N nested element regions where this inefficient recursion would 
proudce 2^N calls. I suspect that this issue might be behind 
https://github.com/llvm/llvm-project/issues/89045 (note that `sheervideo.c` 
does very complex pointer arithmetic).

This commit is essentially NFC, apart from the performance improvements and the 
removal of (probably irrelevant) duplicate entries from the return value of 
`getTaintedSymbols()` calls.

>From 2581f4858c02a51b88e15f657fc99d035c066bca Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 22 Apr 2024 15:32:27 +0200
Subject: [PATCH] [analyzer] Fix performance of getTaintedSymbolsImpl()

Previously the function
```
std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State,
const MemRegion *Reg,
TaintTagType K,
bool returnFirstOnly)
```
(one of the 8 overloaded variants under this name) was handling element
regions in a highly inefficent manner: it performed the "also examine
the super-region" step twice. (Once in the branch for element regions,
and once in the more general branch for all `SubRegion`s -- note that
`ElementRegion` is a subclass of `SubRegion`.)

As pointer arithmetic produces `ElementRegion`s, it's not too difficult
to get a chain of N nested element regions where this inefficient
recursion would proudce 2^N calls. I suspect that this issue might be
behind https://github.com/llvm/llvm-project/issues/89045
(note that `sheervideo.c` does very complex pointer arithmetic).

This commit is essentially NFC, apart from the performance improvements
and the removal of (probably irrelevant) duplicate entries from the
return value of `getTaintedSymbols()` calls.
---
 clang/lib/StaticAnalyzer/Checkers/Taint.cpp | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp 
b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 4edb671753bf45..6362c82b009d72 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -216,21 +216,17 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   std::vector TaintedSymbols;
   if (!Reg)
 return TaintedSymbols;
-  // Element region (array element) is tainted if either the base or the offset
-  // are tainted.
+
+  // Element region (array element) is tainted if the offset is tainted.
   if (const ElementRegion *ER = dyn_cast(Reg)) {
 std::vector TaintedIndex =
 getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly);
 llvm::append_range(TaintedSymbols, TaintedIndex);
 if (returnFirstOnly && !TaintedSymbols.empty())
   return TaintedSymbols; // return early if needed
-std::vector TaintedSuperRegion =
-getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
-llvm::append_range(TaintedSymbols, TaintedSuperRegion);
-if (returnFirstOnly && !TaintedSymbols.empty())
-  return TaintedSymbols; // return early if needed
   }
 
+  // Symbolic region is tainted if the corresponding symbol is tainted.
   if (const SymbolicRegion *SR = dyn_cast(Reg)) {
 std::vector TaintedRegions =
 getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly);
@@ -239,6 +235,8 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   return TaintedSymbols; // return early if needed
   }
 
+  // Any subregion (including Element and Symbolic regions) is tainted if its
+  // super-region is tainted.
   if (const SubRegion *ER = dyn_cast(Reg)) {
 std::vector TaintedSubRegions =
 getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
@@ -318,4 +316,4 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
 }
   }
   return TaintedSymbols;
-}
\ No newline at end of file
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits