Mulyadi Santosa writes: >> Yes, I have read that paper, it’s wonderful! >> >> Besides the Argos, the bitblaze group, led by Dawn Song in Berkeley, has >> achieved great success in the taint analysis. The website about their >> dynamic analysis work (called TEMU) can be found at: >> http://bitblaze.cs.berkeley.edu/temu.html >> >> And TEMU is now open-source.
> Thanks for sharing that...it's new stuff for me. So, why don't you > just pick TEMU and improve it instead of...uhm...sorry if I am wrong, > working from scratch? After all, I believe in both Argos and TEMU (and > maybe other similar projects), they share common codes here and there. > But ehm...CMIIW, seems like TEMU is based on Qemu 0.9,x, right? So > it's.... sorry I forgot the name, the generated code is mostly a > constructed by fragments of small codes generated by gcc. Now, it is > qemu which does it by itself. So, a lot of things change > (substantially). I haven't read the TEMU work, but from the problem description I think you want something similar to "Practical Taint-Based Protection using Demand Emulation" or many others (I remember reading some of them a few years ago on the ISCA, MICRO and/or ASPLOS conferences). >> Yes. For each process’s memory space A, I wanna make a shadow memory B. The >> shadow memory is used to store the tag of data. In other words, if addr in >> memory A is tainted, then the corresponding byte in B should be marked to >> indicate that addr in A is tainted. The main question here is... what is the granularity that you want to track with? Bytes? Words? Pages? This will greatly influence which is your best approach. Now that I think of it, you could use the tracing points I sent for guest virtual memory accesses, and instrument them instead of calling a file-tracing backend (this should provide a hook for an arbitrary granularity). Then, simply keep track also of address-space changes and your instrumentation code can always know when to activate propagation. This, together with the optimization I sent for dynamic control of trace generation in TCG emulation code should get you on tracks. Of course, you should still modify all register-accessing instructions to propagate information passing through the register set. For that, maybe you could start with the "fetch" tracing/instrumentation point I sent long time ago, which keeps track of general-purpose register usage/definition on x86 (although I'm sure I left some astray usages due to the decoding complexity in x86). >> The guest os collects “higher” semantic >> from the OS level, and the QEMU collects “lower” semantic from the >> instruction level. Combination of both semantics is necessary in the >> analysis process. > The question is, in a situation where malware already compromise "the > higher semantic", could we trust the analysis? Beware, I've read exactly this kind of scheme on previous top-tier conferences (but I think tests were using an architectural simulator, so it's not for a current production environment). I've found it :) Secure program execution via dynamic information flow tracking ASPLOS 2004 >> The question is: how to communicate between the QEMU and the guest OS, so >> that they can cooperate with each other? A few choices here, but you should first define if the communication must be based just on control signals, and/or providing memory storage: * virtual device : If you need some kind of storage that the guest OS must access, you could look at the ivshmem device * backdoor instruction : It's the simplest option; I sent some patch series recently with two different implementations for x86. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth