Re: [fpc-devel] CMem allocator memory alignment experiment
Am 19.11.2014 12:32, schrieb Karoly Balogh (Charlie/SGR): > Hi, > > On Wed, 19 Nov 2014, Jonas Maebe wrote: > >>> Since the RTL's allocator is documented to align to 16 bytes >> >> Where? > > Ok, that's actually a very good question. :) I didn't find it anywhere, > except some earlier ML/forum posts revealed by Google. > > However, in practice it still seems to align to 16 bytes, and I asked > several people (compiler, RTL, Lazarus developers) during the FPC/Lazraus > conference last weekend in the Netherlands and the consensus was, it's > known the heap manager aligns to 16 bytes, it's designed to do that, and > in general it's a feature, which should be documented if it's not. > > But yeah, everyone appended "but better ask Jonas". :) >From heap.inc: { we need to align the user pointers to 8 byte at least for mmx/sse and doubles on sparc, align to 16 bytes } Cryptic though :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Sergei Gorelkin wrote: 19.11.2014 15:16, Marco van de Voort ?:> In our previous episode, Mark Morgan Lloyd said: introduces a very significant performance overhead;>> Linux also does this. On some but by no means all platforms. I'm specifically trying to>> highlight the fact that on SPARC, Solaris can fix alignment issues (at a>> price) but Linux doesn't try to. I don't know to what extent there are>> comparable issues on other platforms (in particular x86_64) for which>> both Solaris and Linux are implemented.>> On PPC (603), Linux did, but netbsd didn't :-) Though that is 1.9.0 times> experience and thus slightly dated. On mips-linux, I observe no crashes when doing unaligned access with integer instructions, but it still crashes upon unaligned access using floating-point instructions. Sergei Useful to know. I've never tried this, but presumably the signal can be trapped? -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
19.11.2014 15:16, Marco van de Voort пишет: In our previous episode, Mark Morgan Lloyd said: introduces a very significant performance overhead; Linux also does this. On some but by no means all platforms. I'm specifically trying to highlight the fact that on SPARC, Solaris can fix alignment issues (at a price) but Linux doesn't try to. I don't know to what extent there are comparable issues on other platforms (in particular x86_64) for which both Solaris and Linux are implemented. On PPC (603), Linux did, but netbsd didn't :-) Though that is 1.9.0 times experience and thus slightly dated. On mips-linux, I observe no crashes when doing unaligned access with integer instructions, but it still crashes upon unaligned access using floating-point instructions. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
On 19 Nov 2014, at 12:50, Karoly Balogh (Charlie/SGR) wrote: (On a slightly related note, did anyone run current trunk compiler with cmem allocator through valgrind recently? I seem to get quite some "using uninitialized memory" hits.) I don't know whether it's still the case, but in the past you had to disable all SSE2-based helpers in the RTL (move, and maybe some others), because valgrind didn't properly emulate some SSE2 store instructions that we use. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Marco van de Voort wrote: Since cmem is documented to be used from the main program file (iow the users code), that would nicely put the responsibility there. That might be where it's imported, but it's heavily used by just about everything when non-scalar types are being shared between a dynamically-loaded library (DLL or so) and the main program. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
In our previous episode, Mark Morgan Lloyd said: > >> introduces a very significant performance overhead; > > > > Linux also does this. > > On some but by no means all platforms. I'm specifically trying to > highlight the fact that on SPARC, Solaris can fix alignment issues (at a > price) but Linux doesn't try to. I don't know to what extent there are > comparable issues on other platforms (in particular x86_64) for which > both Solaris and Linux are implemented. On PPC (603), Linux did, but netbsd didn't :-) Though that is 1.9.0 times experience and thus slightly dated. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
In our previous episode, Karoly Balogh (Charlie/SGR) said: > > > > Since the RTL's allocator is documented to align to 16 bytes, it's > > > definitely an issue also with Pascal code. We do have code where also > > > Pascal side triggers the aligment issue, but indeed, the main issue is > > > with linked C libs, depending on pointers from the Pascal side. > > > > I'm not an alignment expert, but only when loading types that are larger > > than the pointer size, since only those are not naturally aligned and so > > could cross cacheline bounderies? > > Yes, but the problem is, you have no idea what the underlying library > does, and GCC seems to compile code where it thinks it would be more > optimal which expects the malloc()-alike alignment at least. My point exactly. If tomorrow GCC 5 changes to 32-byte alignment on intel, we are back to where we started. > > Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever) > > unit, but I wouldn't blow up allocation unnecessary if there hasn't been a > > real problem in most cases. > > Well, I would still fix the original cmem to not destroy the underlying > malloc alignment, but that patch should be much less invasive. Another > idea would be to add a simple helper to the RTL, to allocate/free an > aligned memory block, something libc already has, IIRC. So we provide multiple choices, and during problems people can fix their issues by changing unit. Administrating and releasing a holy default cmem in a target specific way is IMHO a bridge too far, though maybe make an exception for targets like x86_64 that really universally align to 16-byte. Attempting t odo so only provides a temporarily relief from alignment issues and triggers complaints from people that see their memory use increase (alignment + pointer) without having a problem in the first place. > > Since cmem is documented to be used from the main program file (iow the > > users code), that would nicely put the responsibility there. > > Yes, but this still doesn't answer the question why my cmem16 doesn't work > for complex apps, while it seems to pass all simple heap testcases. :) I assumed language helpers would access allocated size in Delphi. But then I can't understand why your code then wouldn't work, since size is still directly before allocation. > (On a slightly related note, did anyone run current trunk compiler with > cmem allocator through valgrind recently? I seem to get quite some "using > uninitialized memory" hits.) Not me. Last did that about 1.5 years ago just after 2.6.x to hunt fpdoc bugs. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Karoly Balogh (Charlie/SGR) wrote: Perhaps the most serious scenario is where an architecture or particular implementation requires alignment, but the kernel traps alignment errors and fixes them silently. SPARC Solaris does this and my understanding is that it introduces a very significant performance overhead; Linux also does this. On some but by no means all platforms. I'm specifically trying to highlight the fact that on SPARC, Solaris can fix alignment issues (at a price) but Linux doesn't try to. I don't know to what extent there are comparable issues on other platforms (in particular x86_64) for which both Solaris and Linux are implemented. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Hi, On Wed, 19 Nov 2014, Marco van de Voort wrote: > > Since the RTL's allocator is documented to align to 16 bytes, it's > > definitely an issue also with Pascal code. We do have code where also > > Pascal side triggers the aligment issue, but indeed, the main issue is > > with linked C libs, depending on pointers from the Pascal side. > > I'm not an alignment expert, but only when loading types that are larger > than the pointer size, since only those are not naturally aligned and so > could cross cacheline bounderies? Yes, but the problem is, you have no idea what the underlying library does, and GCC seems to compile code where it thinks it would be more optimal which expects the malloc()-alike alignment at least. > Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever) > unit, but I wouldn't blow up allocation unnecessary if there hasn't been a > real problem in most cases. Well, I would still fix the original cmem to not destroy the underlying malloc alignment, but that patch should be much less invasive. Another idea would be to add a simple helper to the RTL, to allocate/free an aligned memory block, something libc already has, IIRC. > Since cmem is documented to be used from the main program file (iow the > users code), that would nicely put the responsibility there. Yes, but this still doesn't answer the question why my cmem16 doesn't work for complex apps, while it seems to pass all simple heap testcases. :) (On a slightly related note, did anyone run current trunk compiler with cmem allocator through valgrind recently? I seem to get quite some "using uninitialized memory" hits.) Charlie ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Hi, On Wed, 19 Nov 2014, Jonas Maebe wrote: > > Since the RTL's allocator is documented to align to 16 bytes > > Where? Ok, that's actually a very good question. :) I didn't find it anywhere, except some earlier ML/forum posts revealed by Google. However, in practice it still seems to align to 16 bytes, and I asked several people (compiler, RTL, Lazarus developers) during the FPC/Lazraus conference last weekend in the Netherlands and the consensus was, it's known the heap manager aligns to 16 bytes, it's designed to do that, and in general it's a feature, which should be documented if it's not. But yeah, everyone appended "but better ask Jonas". :) CHarlie ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
In our previous episode, Karoly Balogh (Charlie/SGR) said: > > > At least on Linux, malloc() is documented to align to 64 bit on 32 bit and > > > 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that > > > to 4 bytes and 8 bytes, respectively. > > > > Since cmem is intended for use by FPC, I don't see this as a serious issue > > unless somebody is exchanging malloc()ed blocks between Pascal and C code. > > Since the RTL's allocator is documented to align to 16 bytes, it's > definitely an issue also with Pascal code. We do have code where also > Pascal side triggers the aligment issue, but indeed, the main issue is > with linked C libs, depending on pointers from the Pascal side. I'm not an alignment expert, but only when loading types that are larger than the pointer size, since only those are not naturally aligned and so could cross cacheline bounderies? Note that 16-byte is not enough, certain AVX instructions are iirc documented to require 32-byte alignment, and it is also recommended for speed(https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever) unit, but I wouldn't blow up allocation unnecessary if there hasn't been a real problem in most cases. Since cmem is documented to be used from the main program file (iow the users code), that would nicely put the responsibility there. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
On 19 Nov 2014, at 11:49, Karoly Balogh (Charlie/SGR) wrote: Since the RTL's allocator is documented to align to 16 bytes Where? At least http://www.freepascal.org/docs-html/prog/ progsu173.html only says that the size is rounded up to a multiple of 16/32 bytes; it doesn't say anything about the alignment. Besides, that page is also very much out of date (there is no such thing as "HeapPtr" anymore), and I don't think we should be documenting that kind of stuff at all since it can change at any time (only the interface is relevant/defined). If there would be a guaranteed alignment (which there isn't at this time, afaik), then that would be part of the interface contract, of course. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Hi, On Wed, 19 Nov 2014, Mark Morgan Lloyd wrote: > > At least on Linux, malloc() is documented to align to 64 bit on 32 bit and > > 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that > > to 4 bytes and 8 bytes, respectively. > > Since cmem is intended for use by FPC, I don't see this as a serious issue > unless somebody is exchanging malloc()ed blocks between Pascal and C code. Since the RTL's allocator is documented to align to 16 bytes, it's definitely an issue also with Pascal code. We do have code where also Pascal side triggers the aligment issue, but indeed, the main issue is with linked C libs, depending on pointers from the Pascal side. > > This causes multiple performance and other issues, especially on > > processors which require stricter alignment (most ARM CPUs, but also x86 > > with SSE, etc). > > I'm not sure to what extent this remains an issue with current ARM chips. I've > got limited ARM hardware, but some tests that I did with somebody else a few > months ago didn't show up any issues. We do have limited ARM hardware, based on older ARM cores where this is an issue. We use FPC in production on these chips, so it's an issue for us. And since these cores remain in production for the coming years (not just for us, but in general), the compiler and libs have to deal with it. > Perhaps the most serious scenario is where an architecture or particular > implementation requires alignment, but the kernel traps alignment errors and > fixes them silently. SPARC Solaris does this and my understanding is that it > introduces a very significant performance overhead; Linux also does this. Actually there's plenty of hardware, where this is an issue. Almost all "RISC" chips, especially embedded ones do have alignment restrictions to some degree. I know older PPC and recent Power chips having them as well. And even the fastest CPUs have some performance penalty when doing unaligned accessess even if the hardware solves it itself, and it doesn't involve the software side. > ARM Linux may also do it (where demanded by the hardware) but my > understanding is that notifications can be enabled. Yes, we have these notifications enabled, and we're flooded with them, when using the cmem allocator. This is why I started working on this. Charlie ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] CMem allocator memory alignment experiment
Karoly Balogh (Charlie/SGR) wrote: Hi, I think there are several issues with the cmem memory allocator. The main issue that it "breaks" the underlying malloc() memory alignment, by adding a four/eight byte size value to the start of each block for the sole reason to be able to throw Runtime Error 204 in case someone tries to free a block with the wrong size. At least on Linux, malloc() is documented to align to 64 bit on 32 bit and 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that to 4 bytes and 8 bytes, respectively. Since cmem is intended for use by FPC, I don't see this as a serious issue unless somebody is exchanging malloc()ed blocks between Pascal and C code. However I'm not saying that it's not worth fixing. This causes multiple performance and other issues, especially on processors which require stricter alignment (most ARM CPUs, but also x86 with SSE, etc). I'm not sure to what extent this remains an issue with current ARM chips. I've got limited ARM hardware, but some tests that I did with somebody else a few months ago didn't show up any issues. It's more of a problem with SPARC particularly on Linux, but that's rapidly going down the tubes as a viable platform- in part because this very issue breaks a lot of stuff and maintainers have neither hardware nor incentive to investigate. Perhaps the most serious scenario is where an architecture or particular implementation requires alignment, but the kernel traps alignment errors and fixes them silently. SPARC Solaris does this and my understanding is that it introduces a very significant performance overhead; ARM Linux may also do it (where demanded by the hardware) but my understanding is that notifications can be enabled. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] CMem allocator memory alignment experiment
Hi, I think there are several issues with the cmem memory allocator. The main issue that it "breaks" the underlying malloc() memory alignment, by adding a four/eight byte size value to the start of each block for the sole reason to be able to throw Runtime Error 204 in case someone tries to free a block with the wrong size. At least on Linux, malloc() is documented to align to 64 bit on 32 bit and 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that to 4 bytes and 8 bytes, respectively. This causes multiple performance and other issues, especially on processors which require stricter alignment (most ARM CPUs, but also x86 with SSE, etc). I created a cmem variant, which does 16 byte alignment of the returned memory blocks, just like FPC's own Heap Manager does: https://gist.github.com/chainq/6f69a7821cfa2503962f However, when I build FPC with this cmem16 allocator, the compiler explodes. Also it fails with other larger parts of code, and I'm unsure why, I spent a few days debugging, but I couldn't find the issue. Ideas? I wanted to contribute the code to the FPC SVN (after some cleanup) but because of these issues, I couldn't. Yes, the current alignment code is not the most optimal and wastes some memory, but at least it should work. Must be something trivial. Ideas, opinions, suggestions welcomed. Charlie ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel