Re: [fpc-pascal] Effective memory allocation
Relooking at your timings and mine, it appears that you allocate 10x my count of register-size count of items and require 10x the FillChar which you need to initialize your filter array. My timing is about 80 ms and yours looks like 900 ms for 10x more register sized data, which look like the reasonable ratio since we may have difference in the way we get the timings (my timing routines beeing maybe a bit optimistic). Here I think the speed limit is the time it takes to effectively transfer the data to RAM and that whether you have a FillChar or FillQWord that ends up beeing STOSB or STOSQ, that is fully cached inside the CPU thus the limiting factor is the time it takes to move the initialized CPU cached data to the RAM.___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
2014-11-04 6:35 GMT+08:00 Nico Erfurth : > > Well, the first thing you should ask yourself is "Do I REALLY need such > a large bloom filter". Everything larger than the last level cache will > seriously harm your performance as you are going to trigger a lot of > cache and TLB misses. In general you should target for typical L1-Cache > sizes (16-32KByte) or if REALLY necessary L2-Cache-sizes > (256KByte-32MByte). Everything above that will make the performance go > down rapidly, especially in hashing-application the CPU can't prefetch > data properly as the access-patterns are erratic. > I didn't think of things like cache and TLB misses. Because I try to use BloomFilter and HASH to avoid repeated calculation and/or lookups which are time consuming. So, I don't think it will harm performance, unless bloom filter lookup is slower than the calculation, am I right? Thanks! ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
Hi, > Hi All, > > I am programming a Bloom Filter and need a high-performance way to > > On what platform are you doing this ? > > I am programming on Linux, but it will be used on both Windows and > Linux, Windows is the primary target. Well, the first thing you should ask yourself is "Do I REALLY need such a large bloom filter". Everything larger than the last level cache will seriously harm your performance as you are going to trigger a lot of cache and TLB misses. In general you should target for typical L1-Cache sizes (16-32KByte) or if REALLY necessary L2-Cache-sizes (256KByte-32MByte). Everything above that will make the performance go down rapidly, especially in hashing-application the CPU can't prefetch data properly as the access-patterns are erratic. Nico ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
2014-11-03 23:40 GMT+08:00 Adriaan van Os : > Xiangrong Fang wrote: > >> Hi All, >> >> I am programming a Bloom Filter and need a high-performance way to >> > > On what platform are you doing this ? > I am programming on Linux, but it will be used on both Windows and Linux, Windows is the primary target. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
Xiangrong Fang wrote: Hi All, I am programming a Bloom Filter and need a high-performance way to On what platform are you doing this ? Regards, Adriaan van Os ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
Am 03.11.2014 08:32 schrieb "Nico Erfurth" : > > On 02.11.14 15:05, Xiangrong Fang wrote: > > Sorry, the results in previous mail was mis-labeled. > > > > > > The result is: > > > > Using SetLength: > > > > Alloc: 9.40781697E-0001 > > Clear: 2.13420202E-0001 > > > > Using GetMemory: > > > > Alloc: 2.8100E-0005 > > Clear: 7.74975504E-0001 > > > SetLength will already clear the array before returning it, so the time > for SetLength is approximate time(GetMem) + time(Clear). Ah, right. I forgot that :) Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
My results : _Ptr:=GetMem(1)18 mus, 824 ns / GetMem _Ptr:=GetMem(1) + FillChar(_Ptr^,1,0)); 81 ms / GetMem + FillChar var ArInt:array of int32; . SetLength(ArInt, 1 shr 2); 81 ms / SetLength All timings are variable within [time, time+8%] on repeated runs as is generally the case for system timings. SetLength 0's (Internally calls FillChar) the array ArInt so to do a identical comparison you need to add the FillChar to the GetMem if it is your intention to 0 the array before using it. You have to compare things that do the same thing in the end. Tested on XP Pro Win32 Sp3 Intel Core 2CPU 6400 #@ 2.13GHz 2.05 GHr., 1.00 Gb RAM ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
2014-11-03 14:39 GMT+08:00 Sven Barth : > > Would you mind to show the timings that you got for FillChar? :) Using FillChar is always about 5% (or less) faster than FillQWord when used with GetMemory, but will be around 20%-40% faster if the memory is allocated by SetLength. Additionally this switch won't help you. The memory buffer that is > allocated by SetLength is also allocated using GetMemory. So the buffer > itself *is* aligned. But you don't get the start byte of the buffer in case > of SetLength, but the first byte after the information record which might > not be aligned correctly and *no* compiler switch will help you with that. > Then the problem remains with SetLength vs. GetMemory... Why SetLength is about 1x (!) slower than GetMemory? allocating 1G memory took about 0.1 second by SetLength, but is about 1E-5 via GetMemory. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
On 02.11.14 15:05, Xiangrong Fang wrote: > Sorry, the results in previous mail was mis-labeled. > > > The result is: > > Using SetLength: > > Alloc: 9.40781697E-0001 > Clear: 2.13420202E-0001 > > Using GetMemory: > > Alloc: 2.8100E-0005 > Clear: 7.74975504E-0001 SetLength will already clear the array before returning it, so the time for SetLength is approximate time(GetMem) + time(Clear). As your test allocates a gigabyte of memory the OS will not immediatly assign physical memory to your process. The pagetable after a the allocation will contain entries which create faults when accessing the virtual addresses and then the OS will fill those pages in. So on first access a lot of time will be spend inside the OSes memory subsystem. For Linux the newly allocated memory will most likely point to a read only shared physical page which only contains zeros. On the first write to one of those virtual addresses the real page allocation will trigger and Linux assigns a new empty physical page to the virtual memory area. On Linux you can avoid this by passing MAP_POPULATE to mmap when allocating the memory. For SetLength this first access is already done when clearing the memory inside SetLength. For GetMemory it is done when your explicitly access the memory by calling FillQWord later. HTH Nico ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
On 11/02/2014 03:54 PM, Xiangrong Fang wrote: Hi All, ... Also, I usually use pointer to pass block of memory to functions. How do I implement a function with the following signature: procedure MyProc(var Buf; Len: Integer): I mean, how to handle "var Buf" inside the procedure body? You can take the address of the Buf variable by using @Buf. For example: procedure MyProc(var Buf; Len: Integer); var I: Integer; P: PByte; begin P := @Buf; for I := 0 to Len - 1 do (P+I)^ := 0; end; Nikolay ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
On 03.11.2014 02:59, Xiangrong Fang wrote: 2014-11-03 2:50 GMT+08:00 Sven Barth mailto:pascaldra...@googlemail.com>>: If you use SetLength the dynamic array consists not only of the array data, but also of an information record in front of it. This will likely lead to the data not being aligned correctly (FillQWord works best with 8-Byte alignment). So what about testing FillDWord or FillChar? Just to see whether they would be faster. It is quite strange that if I use SetLength+FillByte, it is really faster (for the FillByte), but if I use GetMemory+FillByte, it is not faster than using FillDWord or FillQWord. Then it's indeed as I suspected. The memory buffer itself that is allocated by SetLength is allocated, but the pointer returned to you (what you get as a dynamic array) has - at least for FillQWord - a bad alignment. Looking at the source though it should still be 8-Byte aligned though (the information record has a size of 8 on 32-bit systems)... *sigh* Please also note that not on every system the Fill* methods are implemented in assembler. FillChar mostly is, FillWord and FillDWord perhaps, but FillQWord mostly seldomly (it's e.g. not even implemented that way on x86_64). So especially on 32-bit system 2 32-bit moves are used each. Thus FillQWord is not always the optimal choice and I'd suggest that you time the different Fill* functions especially on different platforms. Would you mind to show the timings that you got for FillChar? :) I found this in FPC doc (for $ALIGN switch): This switch is recognized for Turbo Pascal Compatibility, but is not yet implemented. The alignment of data will be different in any case, since Free Pascalis a 32-bit compiler. It is a pity that this switch is not recognized yet. But I need to understand the last statement, will alignment be different or not? I think that last statement is ment regarding that Turbo Pascal is a 16-bit compiler which had different alignment values available than FPC needs. I don't know what TP supported back then, but I don't imagine that it supported e.g. 16-Byte alignment. Additionally this switch won't help you. The memory buffer that is allocated by SetLength is also allocated using GetMemory. So the buffer itself *is* aligned. But you don't get the start byte of the buffer in case of SetLength, but the first byte after the information record which might not be aligned correctly and *no* compiler switch will help you with that. Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
2014-11-03 2:50 GMT+08:00 Sven Barth : > If you use SetLength the dynamic array consists not only of the array > data, but also of an information record in front of it. This will likely > lead to the data not being aligned correctly (FillQWord works best with > 8-Byte alignment). So what about testing FillDWord or FillChar? Just to see > whether they would be faster. > > It is quite strange that if I use SetLength+FillByte, it is really faster (for the FillByte), but if I use GetMemory+FillByte, it is not faster than using FillDWord or FillQWord. I found this in FPC doc (for $ALIGN switch): This switch is recognized for Turbo Pascal Compatibility, but is not yet implemented. The alignment of data will be different in any case, since Free Pascalis a 32-bit compiler. It is a pity that this switch is not recognized yet. But I need to understand the last statement, will alignment be different or not? Thanks! Xiangrong ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
On Sun, Nov 2, 2014 at 3:54 PM, Xiangrong Fang wrote: > Also, I usually use pointer to pass block of memory to functions. How do > I implement a function with the following signature: > > procedure MyProc(var Buf; Len: Integer): > > I mean, how to handle "var Buf" inside the procedure body? > Use "@Buf" to get a pointer to the address of the untyped variable. If you want to cast that variable to an internal "known" type and the compiler complains you can always use a pointer cast: PMyKnownType(@Buf)^. You can also use pointer arithmetic (e.g. (PElementType(@Buf) + n)^) if the variable is known to be an array. In Delphi (and probably FPC) it is allowed to pass a nil pointer to an untyped variable if you need to: begin MyProc(nil^, 0); end; The compiler understands what you are trying to do and does not complain about the blatant nil dereference. So "var Buf;" is functionally equivalent to "Buf: Pointer;". OTOH "const Buf;" is slightly different in the sense that the compiler will not let you pass the untyped variable to another routine that may change its value. -- Constantine ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
Am 02.11.2014 14:55 schrieb "Xiangrong Fang" : > > Hi All, > > I am programming a Bloom Filter and need a high-performance way to allocate and wipe large block of memory. I did the following test: > > program getmem; > {$mode objfpc}{$H+} > uses epiktimer; > const > SIZE = 1024 * 1024 * 1024; > CNT = 10; > var > a: array of Byte; > p: Pointer; > et: TEpikTimer; > i: Integer; > t1, t2: TimerData; > begin > et := TEpikTimer.Create(nil); > et.Clear(t1); et.Clear(t2); > for i := 1 to CNT do begin > et.Start(t1); > p := GetMemory(SIZE); > //SetLength(a, SIZE); > et.Stop(t1); > et.Start(t2); > FillQWord(p^, SIZE div 8, 0); > //FillQWord(a[0], SIZE div 8, 0); > et.Stop(t2); > FreeMem(p); > //a := nil; > end; > WriteLn('Alloc: ', et.Elapsed(t1) / CNT); > WriteLn('Clear: ', et.Elapsed(t2) / CNT); > end. > > The result is: > > Using GetMemory: > > Alloc: 9.40781697E-0001 > Clear: 2.13420202E-0001 > > Using SetLength: > > Alloc: 2.8100E-0005 > Clear: 7.74975504E-0001 > > It is understandable that GetMemory() is faster than SetLength(), but why the difference in FillQWord()? If you use SetLength the dynamic array consists not only of the array data, but also of an information record in front of it. This will likely lead to the data not being aligned correctly (FillQWord works best with 8-Byte alignment). So what about testing FillDWord or FillChar? Just to see whether they would be faster. > > Also, I usually use pointer to pass block of memory to functions. How do I implement a function with the following signature: > > procedure MyProc(var Buf; Len: Integer): > > I mean, how to handle "var Buf" inside the procedure body? You need to cast Buf to a pointer if I remember correctly. Take a look at the implementation of a TStream's descendant's Write method (e.g. TStringStream) for an example. Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Effective memory allocation
Sorry, the results in previous mail was mis-labeled. The result is: Using SetLength: Alloc: 9.40781697E-0001 Clear: 2.13420202E-0001 Using GetMemory: Alloc: 2.8100E-0005 Clear: 7.74975504E-0001 ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal