[fpc-devel] cmem not aligning memory

2010-04-03 Thread C Western
I notice that the cmem unit does not align memory in the same way as the default unit - removing the cmem unit makes a factor of two difference in the speed of some double precision matrix code. (My system is i386). Inspecting the cmem unit indicates the issue is the extra bytes allocated for

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Jonas Maebe
On 03 Apr 2010, at 13:00, C Western wrote: I notice that the cmem unit does not align memory in the same way as the default unit - removing the cmem unit makes a factor of two difference in the speed of some double precision matrix code. (My system is i386). Inspecting the cmem unit

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Micha Nelissen
C Western wrote: Inspecting the cmem unit indicates the issue is the extra bytes allocated for the count - is this really needed? Or do we have to allocate more bytes for blocks that are a multiple of 8? Do C memory managers guarantee any alignment anyway? Not for SSE (16 bytes) I'm sure,

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Marco van de Voort
In our previous episode, Jonas Maebe said: Or do we have to allocate more bytes for blocks that are a multiple of 8? FPC's default memory manager even guarantees 16 byte alignment (for vectors). So a possible solution is to allocate 16-sizeof(ptruint) bytes more? for 32-bit that would mean:

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Jonas Maebe
On 03 Apr 2010, at 14:09, Micha Nelissen wrote: Do C memory managers guarantee any alignment anyway? Not for SSE (16 bytes) I'm sure, but 8 bytes I don't know. From Linux' malloc man page: For calloc() and malloc(), the value returned is a pointer to the allo- cated memory, which is

Re: [fpc-devel] cmem not aligning memory

2010-04-03 Thread Michalis Kamburelis
Marco van de Voort wrote: In our previous episode, Jonas Maebe said: Or do we have to allocate more bytes for blocks that are a multiple of 8? FPC's default memory manager even guarantees 16 byte alignment (for vectors). So a possible solution is to allocate 16-sizeof(ptruint) bytes more?