http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc File src/ia32/macro-assembler-ia32.cc (right):
http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode902 src/ia32/macro-assembler-ia32.cc:902: // Because destination is 4-byte aligned, we keep it aligned for movs. On 2011/01/14 10:47:37, Lasse Reichstein wrote:
How do we know that destination is 4-byte aligned? Do you mean "If destination is 4-byte aligned"?
In our uses, source is 4-byte aligned. Changed comment. http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode906 src/ia32/macro-assembler-ia32.cc:906: shr(ecx, 2); Long rep.movs averages much less than a cycle per word. This would save one word only 1/4 of the time. Not worth the extra code, I think. On 2011/01/14 10:47:37, Lasse Reichstein wrote:
If length was divisible by four, you will copy the last word twice. Subtract one from ecx before shifting, to copy one less word if
(length & 3) ==
0, but not otherwise.
http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode921 src/ia32/macro-assembler-ia32.cc:921: dec(length); I tried that - it was slower. On 2011/01/14 10:47:37, Lasse Reichstein wrote:
This won't be faster if you do:
add(source, length); add(destination, length); neg(length); bind(&short_loop); mov_b(scratch, Operand(source, ecx)); mov_b(Operand(destination, ecx), scratch); inc(ecx); j(not_zero, &short_loop);
(It's quite possible that it won't. There's probably lots of latency
available
during reading and writing for updating all three registers, and it's
actually
one instruction longer in total).
http://codereview.chromium.org/6148007/ -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev
