http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc
File src/ia32/macro-assembler-ia32.cc (right):

http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode902
src/ia32/macro-assembler-ia32.cc:902: // Because destination is 4-byte
aligned, we keep it aligned for movs.
On 2011/01/14 10:47:37, Lasse Reichstein wrote:
How do we know that destination is 4-byte aligned?
Do you mean "If destination is 4-byte aligned"?

In our uses, source is 4-byte aligned.
Changed comment.

http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode906
src/ia32/macro-assembler-ia32.cc:906: shr(ecx, 2);
Long rep.movs averages much less than a cycle per word.  This would save
one word only 1/4 of the time.  Not worth the extra code, I think.

On 2011/01/14 10:47:37, Lasse Reichstein wrote:
If length was divisible by four, you will copy the last word twice.
Subtract one from ecx before shifting, to copy one less word if
(length & 3) ==
0, but not otherwise.

http://codereview.chromium.org/6148007/diff/11001/src/ia32/macro-assembler-ia32.cc#newcode921
src/ia32/macro-assembler-ia32.cc:921: dec(length);
I tried that - it was slower. On 2011/01/14 10:47:37, Lasse Reichstein
wrote:
This won't be faster if you do:

  add(source, length);
  add(destination, length);
  neg(length);
  bind(&short_loop);
  mov_b(scratch, Operand(source, ecx));
  mov_b(Operand(destination, ecx), scratch);
  inc(ecx);
  j(not_zero, &short_loop);

(It's quite possible that it won't. There's probably lots of latency
available
during reading and writing for updating all three registers, and it's
actually
one instruction longer in total).

http://codereview.chromium.org/6148007/

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to