Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-10-16 Thread zeljko
On Thursday 02 of June 2011 22:36:53 Florian Klämpfl wrote:
 Am 01.06.2011 22:07, schrieb Michalis Kamburelis:
  Hi,
  
  In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
 
  least for some cases:
 I've commited an improved version in r17642

Is this merged in 2.4.5 ?

zeljko
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-03 Thread Florian Klämpfl
Am 03.06.2011 03:06, schrieb Michalis Kamburelis:
 Florian Klämpfl wrote:
 Am 01.06.2011 22:07, schrieb Michalis Kamburelis:
 Hi,

 In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
 least for some cases:

 I've commited an improved version in r17642
 
 That's great :) I just tested with fpc from SVN (rev 17644), and can
 confirm that CompareMem is much faster now. It beats CompareMem from
 both FPC 2.4.4 and FPC 2.4.2:
 
 FPC 2.4.2: 2.635s
 FPC 2.4.4: 16.803s
 FPC trunk: 1.931s

I improved CompareDWord as well, for your application it should be even
better.

 
 Although it's still slightly slower than comparing directly with (V1[0]
 = V2[0]) and ..., which is ~1.2s for all 3 FPC versions mentioned
 above. I guess there has to be some price for CompareMem being more
 general, and checking size at runtime.

Yes.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-03 Thread Michalis Kamburelis
Florian Klämpfl wrote:
 
 I improved CompareDWord as well, for your application it should be even
 better.
 

Ha, I hesitated earlier to mention that CompareWord and CompareDWord are
~4 times slower than equivalent (on the same number of bytes)
CompareMem/CompareByte calls :) Cool, times with latest FPC (rev 17651):

CompareMem: 1.928s
CompareWord: 1.270s
CompareDWord: 1.108s

These times directly help in my algorithm (I use it to aggressively
search for pairs of matching edges for shadow volumes). Great!

Thanks,
Michalis
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread Sven Barth

On 01.06.2011 22:07, Michalis Kamburelis wrote:

Any thoughts? Maybe something can be improved?

1. Why CompareMem got slower in FPC 2.4.4? Maybe something can be fixed?



Let's see...

CompareMem in 2.4.2:

=== source begin ===

function CompareMem(P1, P2: Pointer; Length: cardinal): Boolean;
var
  i: cardinal;
begin
  Result:=True;
  I:=0;
  If (P1)(P2) then
While Result and (iLength) do
  begin
  Result:=PByte(P1)^=PByte(P2)^;
  Inc(I);
  Inc(pchar(P1));
  Inc(pchar(P2));
  end;
end;

=== source end ===

CompareMem in 2.4.4:

=== source begin ===

function CompareMem(P1, P2: Pointer; Length: cardinal): Boolean;
begin
  Result:=CompareByte(P1^,P2^,Length)=0;
end;

=== source end ===

This at least explains why CompareMem behaves similar to CompareByte in 
your test :D




2. The simple comparison (V1[0] = V2[0]) and... is much faster than
any CompareXxx. Any chance of improving it? In this case, size is known
at compile time, so maybe CompareXxx could be magic and (for
reasonably small sizes) the compiler could just generate a proper code
to compare them just like = operator? Just an idea of course, I don't
know how easy it would be to actually implement.


The CompareMem function is in unit SysUtils and I don't think it is a 
good idea to magicalize that unit.


Maybe it's better if you or someone else would try to improve the 
performance of the i386 assembler code that makes up the 
Compare(Byte/Word/DWord) functions (it's located in rtl/i386/i386.inc btw).


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread Sergei Gorelkin

02.06.2011 19:49, Sven Barth пишет:


Maybe it's better if you or someone else would try to improve the performance 
of the i386 assembler
code that makes up the Compare(Byte/Word/DWord) functions (it's located in 
rtl/i386/i386.inc btw).

I tend to agree. The string instructions used there (rep cmpsb, etc.) are known to be not the 
fastest solution on modern x86 CPUs.


Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread José Mejuto
Hello FPC,

Wednesday, June 1, 2011, 10:07:18 PM, you wrote:

MK In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
MK least for some cases:

The difference is that CompareMem takes the same time to check a
memory block if the elements are equal or different, while the pascal
code aborts inmediatly after first difference.

I had performed a fast test filling 2 megabytes of memory, the first
with value 1 and the second with value 2. Performing the test 10
millions times takes 610 milliseconds. In the second test fill both
arrays with same value, and CompareMem takes 610 milliseconds.

After some tests (not very professional) CompareMem takes much more
advantage if the amount of bytes to be compared are more than 32-38, so
the best performance I get is using this function:

function XCompareByte(const p1,p2,Length: sizeInt): SizeInt;
var
  j: SizeInt;
  PB1: PBYTE absolute p1;
  PB2: PBYTE absolute p2;
begin
  if Length=38 then begin
for j := 0 to Length-1 do begin
  if pb1[j]PB2[j] then begin
Result:=BYTE(PB1[j]-PB2[j]);
exit;
  end;
end;
Result:=0;
  end else begin
Result:=CompareByte(p1,p2,Length);
  end;
end;

In my computer Intel 9300 QCore WindowsXP this function get the best
of the two worlds, fast comparison escape for different memory blocks,
fast compare for small blocks and the rep cmpsl for larger blocks,
performance that can not be achieved with the pascal loop. The
comparison penalty if rapidly payed with the fast escape in case of
differeces in the very first bytes.

I'm quite sure the magic number is 32 not 38, but in my case I get
better performance with 38.

Simple test code:

procedure Comparememtest;
const
  COMPARELENGTH=1000;
var
  p1,p2: PBYTE;
  j: integer;
  ST,ET: TTimeStamp;
begin
  GetMem(p1,100);
  for j := 0 to 100-1 do begin
p1[j]:=0;
  end;
  GetMem(p2,100);
  for j := 0 to 100-1 do begin
p2[j]:=0;
  end;
  ST:=DateTimeToTimeStamp(Now);
  for j := 0 to 1000 do begin
XCompareByte(SizeInt(p1),SizeInt(p2),COMPARELENGTH);
  end;
  ET:=DateTimeToTimeStamp(Now);
  self.Caption:=inttostr(ET.Time-ST.Time);
  FreeMem(p1);
  FreeMem(p2);
end;

Times table:
 PlainCompare  CompareByteXCompareByte
   ---
Equal arrays 1000 elements   16250 ms  625 ms 656 ms
Diff. arrays 1000 elements  62 ms  640 ms 656 ms

Equal arrays 32 elements   547 ms  625 ms 547 ms
Diff. arrays 32 elements62 ms  640 ms  62 ms

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread Florian Klämpfl
Am 02.06.2011 18:45, schrieb José Mejuto:
 Hello FPC,
 
  PlainCompare  CompareByteXCompareByte
    ---
 Equal arrays 1000 elements   16250 ms  625 ms 656 ms
 Diff. arrays 1000 elements  62 ms  640 ms 656 ms
 
 Equal arrays 32 elements   547 ms  625 ms 547 ms
 Diff. arrays 32 elements62 ms  640 ms  62 ms
 

I improved the original CompareByte, please tell me how it works for you.
function CompareByte(Const buf1,buf2;len:SizeInt):SizeInt; assembler;
var
  saveesi,saveedi : longint;
  savebl : byte;
asm
{$ifdef REGCALL}
cmpl$57,%ecx
jg  .LCmpbyteFull

movb%bl,savebl
.LCmpbyteLoop:
movb(%eax),%bl
cmpb(%edx),%bl
leal1(%eax),%eax
leal1(%edx),%edx
jne .LCmpbyteExitFast
decl%ecx
jne .LCmpbyteLoop
.LCmpbyteExitFast:
seteb   %al
movbsavebl,%bl
leave
ret
.LCmpbyteFull:
movl%edi,saveedi
movl%esi,saveesi
cld
movl%eax,%edi
movl%edx,%esi
movl%ecx,%eax
{$else}
movl%edi,saveedi
movl%esi,saveesi
cld
movlbuf2,%esi   { Load params}
movlbuf1,%edi
movllen,%eax
{$endif}
testl   %eax,%eax   {We address -1(%esi), so we have to deal with 
len=0}
je  .LCmpbyteExit
cmpl$10,%eax {7 not worth aligning and go through all 
trouble}
jl  .LCmpbyte2
movl%edi,%ecx   { Align on 32bits }
negl%ecx{ calc bytes to align   (%edi and 3) xor 3= 
-%edi and 3}
andl$3,%ecx
subl%ecx,%eax   { Subtract from number of bytes to go}
orl %ecx,%ecx
rep
cmpsb   {The actual 32-bit Aligning}
jne .LCmpbyte3
movl%eax,%ecx   {bytes to do, divide by 4}
andl$3,%eax {remainder}
shrl$2,%ecx {The actual division}
orl %ecx,%ecx   {Sets zero flag if ecx=0 - no cmp}
rep
cmpsl
je  .LCmpbyte2   { All equal? then to the left over bytes}
movl$4,%eax { Not equal. Rescan the last 4 bytes bytewise}
subl%eax,%esi
subl%eax,%edi
.LCmpbyte2:
movl%eax,%ecx   {bytes still to (re)scan}
orl %eax,%eax   {prevent disaster in case %eax=0}
.balign 4
rep
cmpsb
.LCmpbyte3:
movzbl  -1(%esi),%ecx
movzbl  -1(%edi),%eax  // Compare failing (or equal) position
subl%ecx,%eax
.LCmpbyteExit:
movlsaveedi,%edi
movlsaveesi,%esi
end;
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread José Mejuto
Hello FPC,

Wednesday, June 1, 2011, 10:07:18 PM, you wrote:

MK In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
MK least for some cases:

Please take care with my last email, it has a bug that result in wrong
speed tests, anyway after fixing the bug, the magic number for plain
compare is around 28-32, so for blocks of 28-32 or less bytes, a plain
compare is much faster than CompareMem even when the memory blocks are
identical.

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread Florian Klämpfl
Am 01.06.2011 22:07, schrieb Michalis Kamburelis:
 Hi,
 
 In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
 least for some cases:

I've commited an improved version in r17642
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] CompareMem slower in FPC 2.4.4

2011-06-02 Thread Michalis Kamburelis
Florian Klämpfl wrote:
 Am 01.06.2011 22:07, schrieb Michalis Kamburelis:
 Hi,

 In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at
 least for some cases:
 
 I've commited an improved version in r17642

That's great :) I just tested with fpc from SVN (rev 17644), and can
confirm that CompareMem is much faster now. It beats CompareMem from
both FPC 2.4.4 and FPC 2.4.2:

FPC 2.4.2: 2.635s
FPC 2.4.4: 16.803s
FPC trunk: 1.931s

Although it's still slightly slower than comparing directly with (V1[0]
= V2[0]) and ..., which is ~1.2s for all 3 FPC versions mentioned
above. I guess there has to be some price for CompareMem being more
general, and checking size at runtime.

Thanks everyone,
Michalis
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel