[klee-dev] Dealing with missing intrinsics (Rust support)

2020-08-14 Thread Alastair Reid
Hi,


I want to check what the best way is to deal with missing intrinsics when
adapting KLEE for a new language and new standard libraries.
In particular, what kind of changes should I make and submit as PRs?

I am trying to use KLEE with a modest sized Rust application (about 20kloc)
but that depends on over 100 other libraries ("crates") and Rust's fairly
rich standard library.
Given the amount of code involved, it is unsurprising that the resulting
LLVM bitcode files use some intrinsics that KLEE does not support.
I don't know whether I have a complete list but at least the following are
needed:

maxnum   minnum
x86_avx2_pshuf_b x86_ssse3_pshuf_b_128
x86_avx2_pslli_d x86_avx2_psrli_d
x86_avx_ldu_dq_256   x86_sse3_ldu_dq
x86_sse2_pslli_d x86_sse2_psrli_d
x86_sse2_pause
x86_avx_vzeroupper

As far as I can tell, KLEE checks that it accepts all intrinsics before
starting symbolic execution.
So it can reject a program even if there is no feasible path these
intrinsics just because they are in the bitcode file.
I have seen this happen in my current trials (on short test programs that
only use the standard library).


I see three solutions - with various pros/cons

1) Implement all missing intrinsics.
This seems like a good option for target independent intrinsics like
maxnum (IEEE float min/max)
And it might be necessary for x86_sse2_pause because LLVM will emit
this instruction even if you disable SSE2 (this instruction is a NOP on old
x86 cores - it makes spinlocks faster on modern cores).

The disadvantage is that implementing all LLVM intrinsics is a
Sisyphean task.
(I have not even explored what intrinsics are required on Arm - which
is a much more
widespread ISA than x86!)

2) Compile the libraries with AVX and SSE features disabled.
This should fix most of the intrinsics (except for
maxnum/minnum/x86_sse2_pause).

The disadvantage of messing with compilation options is that this
becomes quite hard
 to do in large, complex projects.

3) Change KLEE to ignore any missing intrinsics unless it actually executes
them.
This would make option 1 less painful because I would only need to
implement
intrinsics that are truly needed.

I have a change that would do this - but I don't think it is ready to
submit as a PR yet.
(diff attached at end).

(Using this change, I got a bit further with the application - I was
able to start running
the code in KLEE but it then got stuck because one of the libraries
contains a mix of
Rust, C and assembly code. But that's a problem for next week...)

I am currently leaning towards option 3 but I want to check which is
preferred and there are a few possible variants (sketched at the end)

--
Alastair Reid

ps Here is the change. It ignores all intrinsics on the above list during
the Cleaner pass (but still reports an error if it hits them during
execution).

The main goal of this code is to avoid calling
"IL->LowerIntrinsicCall(ii);" in the default case because that LLVM
function fails an assertion if it cannot lower the call.
(The function is defined in LLVM in llvm/lib/CodeGen/IntrinsicLowering.cpp)

Some things that I am unsure about are:
- I am a bit uneasy about hardcoding the list of intrinsics to ignore.
- Is there a need for a command line option to enable/disable ignoring
unknown intrinsics.

diff --git a/lib/Module/IntrinsicCleaner.cpp
b/lib/Module/IntrinsicCleaner.cpp
index a1d4fdda..e88d5300 100644
--- a/lib/Module/IntrinsicCleaner.cpp
+++ b/lib/Module/IntrinsicCleaner.cpp
@@ -10,6 +10,7 @@
 #include "Passes.h"

 #include "klee/Config/Version.h"
+#include "klee/Support/ErrorHandling.h"
 #include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/IR/Constants.h"
@@ -20,6 +21,7 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsX86.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Type.h"
 #include "llvm/Pass.h"
@@ -340,6 +342,33 @@ bool IntrinsicCleanerPass::runOnBasicBlock(BasicBlock
, Module ) {
 break;
   }
 #endif
+
+// The following intrinsics are known to be missing from KLEE
+// but can be generated by Rustc and are found in libcore, libstd,
+// common Rust crates, etc.
+//
+// Adding an intrinsic to this list allows KLEE to execute
programs that
+// use these intrinsics as long as the paths explored by KLEE
+// do not invoke these intrinsics.
+  case Intrinsic::maxnum:
+  case Intrinsic::minnum:
+  case Intrinsic::x86_avx2_pshuf_b:
+  case Intrinsic::x86_avx2_pslli_d:
+  case Intrinsic::x86_avx2_psrli_d:
+  case Intrinsic::x86_avx_ldu_dq_256:
+  case Intrinsic::x86_avx_vzeroupper:
+  case Intrinsic::x86_sse2_pause:
+  case Intrinsic::x86_sse2_pslli_d:
+  case Intrinsic::x86_sse2_psrli_d:
+  case 

Re: [klee-dev] Dealing with missing intrinsics (Rust support)

2020-08-14 Thread Cristian Cadar
+1 to this, we should only terminate those paths that hit unhandled 
intrinsics.  This is what we do with unavailable external functions and 
inline assembly for instance, so we should adopt the same behaviour here.


Best,
Cristian

On 14/08/2020 18:06, Nowack, Martin wrote:

Hi Alastair,

I would also rather use option 3.
Normally, the `LowerIntrinsic` pass should try to replace appropriate 
calls if possible. It should not terminate it.


Beside your attached code, I would suggest to change the following code 
line in `lib/Core/Executor.cpp`:

https://github.com/klee/klee/blob/088fc21e12c9675161172899be5ef5051f1ae96b/lib/Core/Executor.cpp#L1514

Instead of an `klee_error` which terminates the whole KLEE execution, 
one could terminate the current path.
This allows to explore other paths that don’t contain such instructions 
or at least explore a path until it reaches such instruction, which I 
guess might be valuable in itself.



As you mentioned inline assembly, there is a currently a pass which 
tries to lift mostly memory barriers into LLVM intrinsics that are 
handles as NOPs. More complex things are hard to handle.

(https://github.com/klee/klee/blob/088fc21e12c9675161172899be5ef5051f1ae96b/lib/Module/RaiseAsm.cpp)

Hope that helps.

Best,
Martin

On 14. Aug 2020, at 17:33, Alastair Reid > wrote:


Hi,


I want to check what the best way is to deal with missing intrinsics 
when adapting KLEE for a new language and new standard libraries.

In particular, what kind of changes should I make and submit as PRs?

I am trying to use KLEE with a modest sized Rust application (about 
20kloc) but that depends on over 100 other libraries ("crates") and 
Rust's fairly rich standard library.
Given the amount of code involved, it is unsurprising that the 
resulting LLVM bitcode files use some intrinsics that KLEE does not 
support.
I don't know whether I have a complete list but at least the following 
are needed:


    maxnum               minnum
    x86_avx2_pshuf_b     x86_ssse3_pshuf_b_128
    x86_avx2_pslli_d     x86_avx2_psrli_d
    x86_avx_ldu_dq_256   x86_sse3_ldu_dq
    x86_sse2_pslli_d     x86_sse2_psrli_d
    x86_sse2_pause
    x86_avx_vzeroupper

As far as I can tell, KLEE checks that it accepts all intrinsics 
before starting symbolic execution.
So it can reject a program even if there is no feasible path these 
intrinsics just because they are in the bitcode file.
I have seen this happen in my current trials (on short test programs 
that only use the standard library).



I see three solutions - with various pros/cons

1) Implement all missing intrinsics.
    This seems like a good option for target independent intrinsics 
like maxnum (IEEE float min/max)
    And it might be necessary for x86_sse2_pause because LLVM will 
emit this instruction even if you disable SSE2 (this instruction is a 
NOP on old x86 cores - it makes spinlocks faster on modern cores).


    The disadvantage is that implementing all LLVM intrinsics is a 
Sisyphean task.
    (I have not even explored what intrinsics are required on Arm - 
which is a much more

    widespread ISA than x86!)

2) Compile the libraries with AVX and SSE features disabled.
    This should fix most of the intrinsics (except for 
maxnum/minnum/x86_sse2_pause).


    The disadvantage of messing with compilation options is that this 
becomes quite hard

     to do in large, complex projects.

3) Change KLEE to ignore any missing intrinsics unless it actually 
executes them.
    This would make option 1 less painful because I would only need to 
implement

    intrinsics that are truly needed.

    I have a change that would do this - but I don't think it is ready 
to submit as a PR yet.

    (diff attached at end).

    (Using this change, I got a bit further with the application - I 
was able to start running
    the code in KLEE but it then got stuck because one of the 
libraries contains a mix of

    Rust, C and assembly code. But that's a problem for next week...)

I am currently leaning towards option 3 but I want to check which is 
preferred and there are a few possible variants (sketched at the end)


--
Alastair Reid

ps Here is the change. It ignores all intrinsics on the above list 
during the Cleaner pass (but still reports an error if it hits them 
during execution).


The main goal of this code is to avoid calling 
"IL->LowerIntrinsicCall(ii);" in the default case because that LLVM 
function fails an assertion if it cannot lower the call.
(The function is defined in LLVM in 
llvm/lib/CodeGen/IntrinsicLowering.cpp)


Some things that I am unsure about are:
- I am a bit uneasy about hardcoding the list of intrinsics to ignore.
- Is there a need for a command line option to enable/disable ignoring 
unknown intrinsics.


diff --git a/lib/Module/IntrinsicCleaner.cpp 
b/lib/Module/IntrinsicCleaner.cpp

index a1d4fdda..e88d5300 100644
--- a/lib/Module/IntrinsicCleaner.cpp
+++ 

Re: [klee-dev] Dealing with missing intrinsics (Rust support)

2020-08-14 Thread Nowack, Martin
Hi Alastair,

I would also rather use option 3.
Normally, the `LowerIntrinsic` pass should try to replace appropriate calls if 
possible. It should not terminate it.

Beside your attached code, I would suggest to change the following code line in 
`lib/Core/Executor.cpp`:
https://github.com/klee/klee/blob/088fc21e12c9675161172899be5ef5051f1ae96b/lib/Core/Executor.cpp#L1514

Instead of an `klee_error` which terminates the whole KLEE execution, one could 
terminate the current path.
This allows to explore other paths that don’t contain such instructions or at 
least explore a path until it reaches such instruction, which I guess might be 
valuable in itself.


As you mentioned inline assembly, there is a currently a pass which tries to 
lift mostly memory barriers into LLVM intrinsics that are handles as NOPs. More 
complex things are hard to handle.
(https://github.com/klee/klee/blob/088fc21e12c9675161172899be5ef5051f1ae96b/lib/Module/RaiseAsm.cpp)

Hope that helps.

Best,
Martin

On 14. Aug 2020, at 17:33, Alastair Reid 
mailto:adr...@google.com>> wrote:

Hi,


I want to check what the best way is to deal with missing intrinsics when 
adapting KLEE for a new language and new standard libraries.
In particular, what kind of changes should I make and submit as PRs?

I am trying to use KLEE with a modest sized Rust application (about 20kloc) but 
that depends on over 100 other libraries ("crates") and Rust's fairly rich 
standard library.
Given the amount of code involved, it is unsurprising that the resulting LLVM 
bitcode files use some intrinsics that KLEE does not support.
I don't know whether I have a complete list but at least the following are 
needed:

maxnum   minnum
x86_avx2_pshuf_b x86_ssse3_pshuf_b_128
x86_avx2_pslli_d x86_avx2_psrli_d
x86_avx_ldu_dq_256   x86_sse3_ldu_dq
x86_sse2_pslli_d x86_sse2_psrli_d
x86_sse2_pause
x86_avx_vzeroupper

As far as I can tell, KLEE checks that it accepts all intrinsics before 
starting symbolic execution.
So it can reject a program even if there is no feasible path these intrinsics 
just because they are in the bitcode file.
I have seen this happen in my current trials (on short test programs that only 
use the standard library).


I see three solutions - with various pros/cons

1) Implement all missing intrinsics.
This seems like a good option for target independent intrinsics like maxnum 
(IEEE float min/max)
And it might be necessary for x86_sse2_pause because LLVM will emit this 
instruction even if you disable SSE2 (this instruction is a NOP on old x86 
cores - it makes spinlocks faster on modern cores).

The disadvantage is that implementing all LLVM intrinsics is a Sisyphean 
task.
(I have not even explored what intrinsics are required on Arm - which is a 
much more
widespread ISA than x86!)

2) Compile the libraries with AVX and SSE features disabled.
This should fix most of the intrinsics (except for 
maxnum/minnum/x86_sse2_pause).

The disadvantage of messing with compilation options is that this becomes 
quite hard
 to do in large, complex projects.

3) Change KLEE to ignore any missing intrinsics unless it actually executes 
them.
This would make option 1 less painful because I would only need to implement
intrinsics that are truly needed.

I have a change that would do this - but I don't think it is ready to 
submit as a PR yet.
(diff attached at end).

(Using this change, I got a bit further with the application - I was able 
to start running
the code in KLEE but it then got stuck because one of the libraries 
contains a mix of
Rust, C and assembly code. But that's a problem for next week...)

I am currently leaning towards option 3 but I want to check which is preferred 
and there are a few possible variants (sketched at the end)

--
Alastair Reid

ps Here is the change. It ignores all intrinsics on the above list during the 
Cleaner pass (but still reports an error if it hits them during execution).

The main goal of this code is to avoid calling "IL->LowerIntrinsicCall(ii);" in 
the default case because that LLVM function fails an assertion if it cannot 
lower the call.
(The function is defined in LLVM in llvm/lib/CodeGen/IntrinsicLowering.cpp)

Some things that I am unsure about are:
- I am a bit uneasy about hardcoding the list of intrinsics to ignore.
- Is there a need for a command line option to enable/disable ignoring unknown 
intrinsics.

diff --git a/lib/Module/IntrinsicCleaner.cpp b/lib/Module/IntrinsicCleaner.cpp
index a1d4fdda..e88d5300 100644
--- a/lib/Module/IntrinsicCleaner.cpp
+++ b/lib/Module/IntrinsicCleaner.cpp
@@ -10,6 +10,7 @@
 #include "Passes.h"

 #include "klee/Config/Version.h"
+#include "klee/Support/ErrorHandling.h"
 #include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/IR/Constants.h"
@@ -20,6 +21,7 @@
 #include