Hi Jim, Jim Keniston wrote: > Quoting Masami Hiramatsu <mhira...@redhat.com>: > >> Hi Jim and Sriker, >> >> Here, I almost rewrote my patch. >> >> Changelog: >> - rewrite decoding logic based on Intel' manual. >> - supoort insn_get_sib(),insn_get_displacement() >> and insn_get_immediate() too. >> - support 3 bytes opcode and 64bit immediate. >> - introduce some bitmaps. >> >> Thank you, > > Well, I didn't do much of a code review -- it looks like you addressed > all my concerns -- but as I mentioned on IRC, I hacked together a test > rig whereby you can disassemble a designated elf file (e.g., vmlinux, > libc, libm) and then compare insn_get_length()'s results with objdump's > results. The comment in distill.awk shows how to use objdump, awk, and > test_get_len together.
Thank you for review and test! > I also hacked up insn_x86.h and insn_x86.c to work in user space. Most > of that is accomplished via insn_x86_user.h, but it certainly isn't > necessary to do it that way. In particular, __u8, __s8, __u16, etc. are > versions of u8, s8, u16, etc. that can be used in both kernel and user > code, so maybe we should switch to those. > > I tested with vmlinux, libc, and libm on both an i686 system and an > x86_64 system. I found and fixed a few bugs. Here are the ones that > come to mind (all fixed): > - shrd/shld, which we discussed > - missing support for weird nops with modrm bytes (0f 1f ...). > - neglected to include the REX prefix in prefixes.nbytes > - missing static decl in an inline function in insn_x86.h Thank you for fixing it. BTW, it might have to support vm86 mode(especially, for user code). > There are some other cases where insn_get_length() doesn't match up with > the disassembly, but I don't consider them bugs: > - 0x9b is an instruction (fwait), but the disassembler treats it as a > prefix. For example 9b df ... can be disassembled as > fstsw ... // wait, then store status word > or > fwait // wait > fnstsw ... // store status word without waiting > Perhaps it's relevant to investigate whether a single-step of 9b df ... > would execute just the fwait or the whole fstsw. Anyway, this explains > the "failures" of finit and fstsw that I mentioned to you. I also saw > this with fstcw and fclex. FYI, there is a single wait/fwait instruction described at Intel software developers manual vol.2B p.399. > - Illegal instruction sequences, such as an x86_64 instruction that > starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these > when disassembling data. I just filtered out (via egrep) instructions > whose disassembly starts with "rex" or includes "(bad)". Sure, I think insn_* should return -EINVAL or set insn.invalid = 1 if we found those invalid ops. E.g. kernel use BUG() macro, it adds some raw numbers after ud2, in that case, those raw numbers might be decoded as an illegal instruction. > We could address the above by filtering them out in distill.awk or > test_get_len.c. I think we're clean otherwise. > > There's a little more housecleaning to do -- e.g., adding Hitachi (?) > copyright to IBM copyright, discarding insn_field_exists() and > insn_extract_reg(), putting this all in git somewhere. But not tonight. > > Pull all the attached files into a directory and have a go -- e.g., > $ make > $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64] > > Jim > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com