It might be worth investigating which memcpy calls are problematic, if you
haven't already. I suspect that MemoryRead and MemoryWrite are the main culprits
but that is just a gut feeling.

Also, with a default -O0 debug build, GCC probably doesn't try to optimise
memcpy, so the memcpy calls might have more impact in a debug build than in a release build. However, there's no reason why a debug build has to be -O0. (I find that ARM code is easier to read at -O2 anyway, on the rare occasions that I
have to delve into disassembled code.)

If we can't get a type-aliasing-safe implementation (using memcpy) to optimise properly, another option is to pass -no-strict-aliasing to GCC (at least for the simulator), which disables all optimisations relating to strict aliasing. Clang
may or may not have an equivalent. It's a bit of a hack though, and it might
disable optimisations elsewhere. (I've no idea what performance impact that
might have. It might be negligible.)


https://codereview.chromium.org/169223004/diff/1/src/a64/instructions-a64.h
File src/a64/instructions-a64.h (left):

https://codereview.chromium.org/169223004/diff/1/src/a64/instructions-a64.h#oldcode120
src/a64/instructions-a64.h:120: memcpy(&bits, this, sizeof(bits));
These simple (fixed-size) memcpy calls should be free. The compiler
should recognise them and compile it as a no-op.

https://codereview.chromium.org/169223004/diff/1/src/a64/simulator-a64.cc
File src/a64/simulator-a64.cc (left):

https://codereview.chromium.org/169223004/diff/1/src/a64/simulator-a64.cc#oldcode142
src/a64/simulator-a64.cc:142: memcpy(stack, &(*it), sizeof(*it));
This copy might not be free because the address needs to be derived from
the iterator. However, this shouldn't be performance-sensitive code.

https://codereview.chromium.org/169223004/diff/1/src/a64/simulator-a64.cc#oldcode1472
src/a64/simulator-a64.cc:1472: memcpy(&read, address, num_bytes);
Again, the size is variable so this copy can't be free.

It might be possible for the compiler to optimise it if you split the
num_bytes cases (as you did for reinterpret_cast) but use a fixed memcpy
in each case.

https://codereview.chromium.org/169223004/diff/1/src/a64/simulator-a64.cc#oldcode1514
src/a64/simulator-a64.cc:1514: memcpy(address, &value, num_bytes);
Ditto.

https://codereview.chromium.org/169223004/

--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to