On Mon, Aug 13, 2012 at 1:07 AM, Julian Seward <jsew...@acm.org> wrote:
>
> I lose track of what is actually required. Is it to implement, for
> vector loads, the same thing that you did for scalar loads? That is,
> don't complain about naturally aligned word size loads that partially
> overlap the end of a block, and instead simply mark the part of the
> register corresponding to the area beyond the end of the block, as
> undefined?
>
Yes, to a first approximation. At least, if it were done, I think I could
come up with patches for the rest.
I would propose the following approach.
Step 1: Add the infrastructure to allow 128-bit return values from helper
functions. We do not yet need 128-bit _arguments_, because addresses are
still just 64 bits. If you want to be forward-looking, allow 256-bit
return values too, since AVX2 is coming.
This step is the only thing I do not think I know how to do myself. If it
were finished, I would be willing to take a crack at the rest.
Step 2: Instead of emulating 128-bit loads with two calls to mc_LOADV64,
implement mc_LOADV128 returning a 128-bit result.
Step 3: Fix the logic for when partial_loads_ok takes effect. Right now,
the final test looks like this:
if (szB == VG_WORDSIZE && VG_IS_WORD_ALIGNED(a)
&& n_addrs_bad < VG_WORDSIZE) {
In other words, the special handling only applies for aligned loads of the
natural word size of the machine. But this is not correct; on a real
system, any aligned load of any size can never "partially fault". So the
correct logic looks something like this:
if (0 == (a % szB) && n_addrs_bad < szB) {
That is, if the address is aligned to the size of the load and the load was
only partially bad, then trigger the special handling. If you want to be
very thorough, add assertions for 0 == (szB & (szB -1)) and (szB <
VKI_PAGE_SIZE). That is, assert that the load size is a power of 2 and
less than the page size. (Although if either of these assertions triggers
on any actual system ever, I promise to eat my hat and shave my eyebrows.)
Step 4: As John Reiser points out in another message, the validity bits
need more precise propagation. PCMPEQB needs to propagate the undefined
_bytes_. PMOVMSKB needs to propagate the undefined _bits_. "Compare
against zero" and "find first set" must give the right answer if they have
enough defined bits to determine it. This would be enough to handle every
case I have actually seen (*).
The test case attached to https://bugs.kde.org/show_bug.cgi?id=294285 exercises
all of these except "compare against zero".
Thanks.
- Pat
(*) For SSE2, anyway. Some non-vector implementations use standard 8-byte
registers and then do something like subtract 0x0101010101010101. Over
time, these should become decreasingly common compared to the superior
vectorized versions. In particular, the Intel compiler I am using never
generates them when it can use SSE2 instead.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users