Re: [fpc-devel] CompareMem slower in FPC 2.4.4
On Thursday 02 of June 2011 22:36:53 Florian Klämpfl wrote: Am 01.06.2011 22:07, schrieb Michalis Kamburelis: Hi, In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at least for some cases: I've commited an improved version in r17642 Is this merged in 2.4.5 ? zeljko ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Am 03.06.2011 03:06, schrieb Michalis Kamburelis: Florian Klämpfl wrote: Am 01.06.2011 22:07, schrieb Michalis Kamburelis: Hi, In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at least for some cases: I've commited an improved version in r17642 That's great :) I just tested with fpc from SVN (rev 17644), and can confirm that CompareMem is much faster now. It beats CompareMem from both FPC 2.4.4 and FPC 2.4.2: FPC 2.4.2: 2.635s FPC 2.4.4: 16.803s FPC trunk: 1.931s I improved CompareDWord as well, for your application it should be even better. Although it's still slightly slower than comparing directly with (V1[0] = V2[0]) and ..., which is ~1.2s for all 3 FPC versions mentioned above. I guess there has to be some price for CompareMem being more general, and checking size at runtime. Yes. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Florian Klämpfl wrote: I improved CompareDWord as well, for your application it should be even better. Ha, I hesitated earlier to mention that CompareWord and CompareDWord are ~4 times slower than equivalent (on the same number of bytes) CompareMem/CompareByte calls :) Cool, times with latest FPC (rev 17651): CompareMem: 1.928s CompareWord: 1.270s CompareDWord: 1.108s These times directly help in my algorithm (I use it to aggressively search for pairs of matching edges for shadow volumes). Great! Thanks, Michalis ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
On 01.06.2011 22:07, Michalis Kamburelis wrote: Any thoughts? Maybe something can be improved? 1. Why CompareMem got slower in FPC 2.4.4? Maybe something can be fixed? Let's see... CompareMem in 2.4.2: === source begin === function CompareMem(P1, P2: Pointer; Length: cardinal): Boolean; var i: cardinal; begin Result:=True; I:=0; If (P1)(P2) then While Result and (iLength) do begin Result:=PByte(P1)^=PByte(P2)^; Inc(I); Inc(pchar(P1)); Inc(pchar(P2)); end; end; === source end === CompareMem in 2.4.4: === source begin === function CompareMem(P1, P2: Pointer; Length: cardinal): Boolean; begin Result:=CompareByte(P1^,P2^,Length)=0; end; === source end === This at least explains why CompareMem behaves similar to CompareByte in your test :D 2. The simple comparison (V1[0] = V2[0]) and... is much faster than any CompareXxx. Any chance of improving it? In this case, size is known at compile time, so maybe CompareXxx could be magic and (for reasonably small sizes) the compiler could just generate a proper code to compare them just like = operator? Just an idea of course, I don't know how easy it would be to actually implement. The CompareMem function is in unit SysUtils and I don't think it is a good idea to magicalize that unit. Maybe it's better if you or someone else would try to improve the performance of the i386 assembler code that makes up the Compare(Byte/Word/DWord) functions (it's located in rtl/i386/i386.inc btw). Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
02.06.2011 19:49, Sven Barth пишет: Maybe it's better if you or someone else would try to improve the performance of the i386 assembler code that makes up the Compare(Byte/Word/DWord) functions (it's located in rtl/i386/i386.inc btw). I tend to agree. The string instructions used there (rep cmpsb, etc.) are known to be not the fastest solution on modern x86 CPUs. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Hello FPC, Wednesday, June 1, 2011, 10:07:18 PM, you wrote: MK In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at MK least for some cases: The difference is that CompareMem takes the same time to check a memory block if the elements are equal or different, while the pascal code aborts inmediatly after first difference. I had performed a fast test filling 2 megabytes of memory, the first with value 1 and the second with value 2. Performing the test 10 millions times takes 610 milliseconds. In the second test fill both arrays with same value, and CompareMem takes 610 milliseconds. After some tests (not very professional) CompareMem takes much more advantage if the amount of bytes to be compared are more than 32-38, so the best performance I get is using this function: function XCompareByte(const p1,p2,Length: sizeInt): SizeInt; var j: SizeInt; PB1: PBYTE absolute p1; PB2: PBYTE absolute p2; begin if Length=38 then begin for j := 0 to Length-1 do begin if pb1[j]PB2[j] then begin Result:=BYTE(PB1[j]-PB2[j]); exit; end; end; Result:=0; end else begin Result:=CompareByte(p1,p2,Length); end; end; In my computer Intel 9300 QCore WindowsXP this function get the best of the two worlds, fast comparison escape for different memory blocks, fast compare for small blocks and the rep cmpsl for larger blocks, performance that can not be achieved with the pascal loop. The comparison penalty if rapidly payed with the fast escape in case of differeces in the very first bytes. I'm quite sure the magic number is 32 not 38, but in my case I get better performance with 38. Simple test code: procedure Comparememtest; const COMPARELENGTH=1000; var p1,p2: PBYTE; j: integer; ST,ET: TTimeStamp; begin GetMem(p1,100); for j := 0 to 100-1 do begin p1[j]:=0; end; GetMem(p2,100); for j := 0 to 100-1 do begin p2[j]:=0; end; ST:=DateTimeToTimeStamp(Now); for j := 0 to 1000 do begin XCompareByte(SizeInt(p1),SizeInt(p2),COMPARELENGTH); end; ET:=DateTimeToTimeStamp(Now); self.Caption:=inttostr(ET.Time-ST.Time); FreeMem(p1); FreeMem(p2); end; Times table: PlainCompare CompareByteXCompareByte --- Equal arrays 1000 elements 16250 ms 625 ms 656 ms Diff. arrays 1000 elements 62 ms 640 ms 656 ms Equal arrays 32 elements 547 ms 625 ms 547 ms Diff. arrays 32 elements62 ms 640 ms 62 ms -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Am 02.06.2011 18:45, schrieb José Mejuto: Hello FPC, PlainCompare CompareByteXCompareByte --- Equal arrays 1000 elements 16250 ms 625 ms 656 ms Diff. arrays 1000 elements 62 ms 640 ms 656 ms Equal arrays 32 elements 547 ms 625 ms 547 ms Diff. arrays 32 elements62 ms 640 ms 62 ms I improved the original CompareByte, please tell me how it works for you. function CompareByte(Const buf1,buf2;len:SizeInt):SizeInt; assembler; var saveesi,saveedi : longint; savebl : byte; asm {$ifdef REGCALL} cmpl$57,%ecx jg .LCmpbyteFull movb%bl,savebl .LCmpbyteLoop: movb(%eax),%bl cmpb(%edx),%bl leal1(%eax),%eax leal1(%edx),%edx jne .LCmpbyteExitFast decl%ecx jne .LCmpbyteLoop .LCmpbyteExitFast: seteb %al movbsavebl,%bl leave ret .LCmpbyteFull: movl%edi,saveedi movl%esi,saveesi cld movl%eax,%edi movl%edx,%esi movl%ecx,%eax {$else} movl%edi,saveedi movl%esi,saveesi cld movlbuf2,%esi { Load params} movlbuf1,%edi movllen,%eax {$endif} testl %eax,%eax {We address -1(%esi), so we have to deal with len=0} je .LCmpbyteExit cmpl$10,%eax {7 not worth aligning and go through all trouble} jl .LCmpbyte2 movl%edi,%ecx { Align on 32bits } negl%ecx{ calc bytes to align (%edi and 3) xor 3= -%edi and 3} andl$3,%ecx subl%ecx,%eax { Subtract from number of bytes to go} orl %ecx,%ecx rep cmpsb {The actual 32-bit Aligning} jne .LCmpbyte3 movl%eax,%ecx {bytes to do, divide by 4} andl$3,%eax {remainder} shrl$2,%ecx {The actual division} orl %ecx,%ecx {Sets zero flag if ecx=0 - no cmp} rep cmpsl je .LCmpbyte2 { All equal? then to the left over bytes} movl$4,%eax { Not equal. Rescan the last 4 bytes bytewise} subl%eax,%esi subl%eax,%edi .LCmpbyte2: movl%eax,%ecx {bytes still to (re)scan} orl %eax,%eax {prevent disaster in case %eax=0} .balign 4 rep cmpsb .LCmpbyte3: movzbl -1(%esi),%ecx movzbl -1(%edi),%eax // Compare failing (or equal) position subl%ecx,%eax .LCmpbyteExit: movlsaveedi,%edi movlsaveesi,%esi end; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Hello FPC, Wednesday, June 1, 2011, 10:07:18 PM, you wrote: MK In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at MK least for some cases: Please take care with my last email, it has a bug that result in wrong speed tests, anyway after fixing the bug, the magic number for plain compare is around 28-32, so for blocks of 28-32 or less bytes, a plain compare is much faster than CompareMem even when the memory blocks are identical. -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Am 01.06.2011 22:07, schrieb Michalis Kamburelis: Hi, In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at least for some cases: I've commited an improved version in r17642 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] CompareMem slower in FPC 2.4.4
Florian Klämpfl wrote: Am 01.06.2011 22:07, schrieb Michalis Kamburelis: Hi, In my tests, FPC 2.4.4 has much slower CompareMem than FPC 2.4.2, at least for some cases: I've commited an improved version in r17642 That's great :) I just tested with fpc from SVN (rev 17644), and can confirm that CompareMem is much faster now. It beats CompareMem from both FPC 2.4.4 and FPC 2.4.2: FPC 2.4.2: 2.635s FPC 2.4.4: 16.803s FPC trunk: 1.931s Although it's still slightly slower than comparing directly with (V1[0] = V2[0]) and ..., which is ~1.2s for all 3 FPC versions mentioned above. I guess there has to be some price for CompareMem being more general, and checking size at runtime. Thanks everyone, Michalis ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel