Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-23 Thread Florian Klaempfl
Am 19.11.2014 12:32, schrieb Karoly Balogh (Charlie/SGR):
> Hi,
> 
> On Wed, 19 Nov 2014, Jonas Maebe wrote:
> 
>>> Since the RTL's allocator is documented to align to 16 bytes
>>
>> Where?
> 
> Ok, that's actually a very good question. :) I didn't find it anywhere,
> except some earlier ML/forum posts revealed by Google.
> 
> However, in practice it still seems to align to 16 bytes, and I asked
> several people (compiler, RTL, Lazarus developers) during the FPC/Lazraus
> conference last weekend in the Netherlands and the consensus was, it's
> known the heap manager aligns to 16 bytes, it's designed to do that, and
> in general it's a feature, which should be documented if it's not.
> 
> But yeah, everyone appended "but better ask Jonas". :)

>From heap.inc:

  { we need to align the user pointers to 8 byte at least for
mmx/sse and doubles on sparc, align to 16 bytes }

Cryptic though :)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Mark Morgan Lloyd

Sergei Gorelkin wrote:
19.11.2014 15:16, Marco van de Voort ?:> In our previous episode, 
Mark Morgan Lloyd said: introduces a very significant performance 
overhead;>> Linux also does this. On some but by no means all 
platforms. I'm specifically trying to>> highlight the fact that on 
SPARC, Solaris can fix alignment issues (at a>> price) but Linux doesn't 
try to. I don't know to what extent there are>> comparable issues on 
other platforms (in particular x86_64) for which>> both Solaris and 
Linux are implemented.>> On PPC (603), Linux did, but netbsd didn't :-)  
Though that is 1.9.0 times> experience and thus slightly dated.


On mips-linux, I observe no crashes when doing unaligned access with 
integer instructions, but it still crashes upon unaligned access using 
floating-point instructions.

Sergei


Useful to know. I've never tried this, but presumably the signal can be 
trapped?


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Sergei Gorelkin

19.11.2014 15:16, Marco van de Voort пишет:

In our previous episode, Mark Morgan Lloyd said:

introduces a very significant performance overhead;


Linux also does this.


On some but by no means all platforms. I'm specifically trying to
highlight the fact that on SPARC, Solaris can fix alignment issues (at a
price) but Linux doesn't try to. I don't know to what extent there are
comparable issues on other platforms (in particular x86_64) for which
both Solaris and Linux are implemented.


On PPC (603), Linux did, but netbsd didn't :-)  Though that is 1.9.0 times
experience and thus slightly dated.


On mips-linux, I observe no crashes when doing unaligned access with integer instructions, but it 
still crashes upon unaligned access using floating-point instructions.


Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Jonas Maebe


On 19 Nov 2014, at 12:50, Karoly Balogh (Charlie/SGR) wrote:

(On a slightly related note, did anyone run current trunk compiler  
with
cmem allocator through valgrind recently? I seem to get quite some  
"using

uninitialized memory" hits.)


I don't know whether it's still the case, but in the past you had to  
disable all SSE2-based helpers in the RTL (move, and maybe some  
others), because valgrind didn't properly emulate some SSE2 store  
instructions that we use.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Mark Morgan Lloyd

Marco van de Voort wrote:


Since cmem is documented to be used from the main program file (iow the
users code), that would nicely put the responsibility there.


That might be where it's imported, but it's heavily used by just about 
everything when non-scalar types are being shared between a 
dynamically-loaded library (DLL or so) and the main program.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Marco van de Voort
In our previous episode, Mark Morgan Lloyd said:
> >> introduces a very significant performance overhead;
> > 
> > Linux also does this.
> 
> On some but by no means all platforms. I'm specifically trying to 
> highlight the fact that on SPARC, Solaris can fix alignment issues (at a 
> price) but Linux doesn't try to. I don't know to what extent there are 
> comparable issues on other platforms (in particular x86_64) for which 
> both Solaris and Linux are implemented.

On PPC (603), Linux did, but netbsd didn't :-)  Though that is 1.9.0 times
experience and thus slightly dated.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Marco van de Voort
In our previous episode, Karoly Balogh (Charlie/SGR) said:
> 
> > > Since the RTL's allocator is documented to align to 16 bytes, it's
> > > definitely an issue also with Pascal code. We do have code where also
> > > Pascal side triggers the aligment issue, but indeed, the main issue is
> > > with linked C libs, depending on pointers from the Pascal side.
> >
> > I'm not an alignment expert, but only when loading types that are larger
> > than the pointer size, since only those are not naturally aligned and so
> > could cross cacheline bounderies?
> 
> Yes, but the problem is, you have no idea what the underlying library
> does, and GCC seems to compile code where it thinks it would be more
> optimal which expects the malloc()-alike alignment at least.

My point exactly. If tomorrow GCC 5 changes to 32-byte alignment on intel,
we are back to where we started.
 
> > Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever)
> > unit, but I wouldn't blow up allocation unnecessary if there hasn't been a
> > real problem in most cases.
> 
> Well, I would still fix the original cmem to not destroy the underlying
> malloc alignment, but that patch should be much less invasive. Another
> idea would be to add a simple helper to the RTL, to allocate/free an
> aligned memory block, something libc already has, IIRC.

So we provide multiple choices, and during problems people can fix their
issues by changing unit.

Administrating and releasing a holy default cmem in a target specific way is
IMHO a bridge too far, though maybe make an exception for targets like
x86_64 that really universally align to 16-byte.

Attempting t odo so only provides a temporarily relief from alignment issues
and triggers complaints from people that see their memory use increase
(alignment + pointer) without having a problem in the first place.
 
> > Since cmem is documented to be used from the main program file (iow the
> > users code), that would nicely put the responsibility there.
> 
> Yes, but this still doesn't answer the question why my cmem16 doesn't work
> for complex apps, while it seems to pass all simple heap testcases. :)

I assumed language helpers would access allocated size in Delphi. But then I 
can't
understand why your code then wouldn't work, since size is still directly
before allocation.
 
> (On a slightly related note, did anyone run current trunk compiler with
> cmem allocator through valgrind recently? I seem to get quite some "using
> uninitialized memory" hits.)

Not me. Last did that about 1.5 years ago just after 2.6.x to hunt fpdoc
bugs.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Mark Morgan Lloyd

Karoly Balogh (Charlie/SGR) wrote:


Perhaps the most serious scenario is where an architecture or particular
implementation requires alignment, but the kernel traps alignment errors and
fixes them silently. SPARC Solaris does this and my understanding is that it
introduces a very significant performance overhead;


Linux also does this.


On some but by no means all platforms. I'm specifically trying to 
highlight the fact that on SPARC, Solaris can fix alignment issues (at a 
price) but Linux doesn't try to. I don't know to what extent there are 
comparable issues on other platforms (in particular x86_64) for which 
both Solaris and Linux are implemented.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Wed, 19 Nov 2014, Marco van de Voort wrote:

> > Since the RTL's allocator is documented to align to 16 bytes, it's
> > definitely an issue also with Pascal code. We do have code where also
> > Pascal side triggers the aligment issue, but indeed, the main issue is
> > with linked C libs, depending on pointers from the Pascal side.
>
> I'm not an alignment expert, but only when loading types that are larger
> than the pointer size, since only those are not naturally aligned and so
> could cross cacheline bounderies?

Yes, but the problem is, you have no idea what the underlying library
does, and GCC seems to compile code where it thinks it would be more
optimal which expects the malloc()-alike alignment at least.

> Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever)
> unit, but I wouldn't blow up allocation unnecessary if there hasn't been a
> real problem in most cases.

Well, I would still fix the original cmem to not destroy the underlying
malloc alignment, but that patch should be much less invasive. Another
idea would be to add a simple helper to the RTL, to allocate/free an
aligned memory block, something libc already has, IIRC.

> Since cmem is documented to be used from the main program file (iow the
> users code), that would nicely put the responsibility there.

Yes, but this still doesn't answer the question why my cmem16 doesn't work
for complex apps, while it seems to pass all simple heap testcases. :)

(On a slightly related note, did anyone run current trunk compiler with
cmem allocator through valgrind recently? I seem to get quite some "using
uninitialized memory" hits.)

Charlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Wed, 19 Nov 2014, Jonas Maebe wrote:

> > Since the RTL's allocator is documented to align to 16 bytes
>
> Where?

Ok, that's actually a very good question. :) I didn't find it anywhere,
except some earlier ML/forum posts revealed by Google.

However, in practice it still seems to align to 16 bytes, and I asked
several people (compiler, RTL, Lazarus developers) during the FPC/Lazraus
conference last weekend in the Netherlands and the consensus was, it's
known the heap manager aligns to 16 bytes, it's designed to do that, and
in general it's a feature, which should be documented if it's not.

But yeah, everyone appended "but better ask Jonas". :)

CHarlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Marco van de Voort
In our previous episode, Karoly Balogh (Charlie/SGR) said:
> > > At least on Linux, malloc() is documented to align to 64 bit on 32 bit and
> > > 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that
> > > to 4 bytes and 8 bytes, respectively.
> >
> > Since cmem is intended for use by FPC, I don't see this as a serious issue
> > unless somebody is exchanging malloc()ed blocks between Pascal and C code.
> 
> Since the RTL's allocator is documented to align to 16 bytes, it's
> definitely an issue also with Pascal code. We do have code where also
> Pascal side triggers the aligment issue, but indeed, the main issue is
> with linked C libs, depending on pointers from the Pascal side.

I'm not an alignment expert, but only when loading types that are larger
than the pointer size, since only those are not naturally aligned and so
could cross cacheline bounderies?

Note that 16-byte is not enough, certain AVX instructions are iirc
documented to require 32-byte alignment, and it is also recommended for
speed(https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors)

Anyway, I don't see a problem with having a cmemalign16 (or -32 or whatever)
unit, but I wouldn't blow up allocation unnecessary if there hasn't been a
real problem in most cases.
 
Since cmem is documented to be used from the main program file (iow the
users code), that would nicely put the responsibility there.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Jonas Maebe


On 19 Nov 2014, at 11:49, Karoly Balogh (Charlie/SGR) wrote:


Since the RTL's allocator is documented to align to 16 bytes


Where? At least http://www.freepascal.org/docs-html/prog/ 
progsu173.html only says that the size is rounded up to a multiple of  
16/32 bytes; it doesn't say anything about the alignment. Besides,  
that page is also very much out of date (there is no such thing as  
"HeapPtr" anymore), and I don't think we should be documenting that  
kind of stuff at all since it can change at any time (only the  
interface is relevant/defined). If there would be a guaranteed  
alignment (which there isn't at this time, afaik), then that would be  
part of the interface contract, of course.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Wed, 19 Nov 2014, Mark Morgan Lloyd wrote:

> > At least on Linux, malloc() is documented to align to 64 bit on 32 bit and
> > 128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that
> > to 4 bytes and 8 bytes, respectively.
>
> Since cmem is intended for use by FPC, I don't see this as a serious issue
> unless somebody is exchanging malloc()ed blocks between Pascal and C code.

Since the RTL's allocator is documented to align to 16 bytes, it's
definitely an issue also with Pascal code. We do have code where also
Pascal side triggers the aligment issue, but indeed, the main issue is
with linked C libs, depending on pointers from the Pascal side.

> > This causes multiple performance and other issues, especially on
> > processors which require stricter alignment (most ARM CPUs, but also x86
> > with SSE, etc).
>
> I'm not sure to what extent this remains an issue with current ARM chips. I've
> got limited ARM hardware, but some tests that I did with somebody else a few
> months ago didn't show up any issues.

We do have limited ARM hardware, based on older ARM cores where this is an
issue. We use FPC in production on these chips, so it's an issue for us.
And since these cores remain in production for the coming years (not just
for us, but in general), the compiler and libs have to deal with it.

> Perhaps the most serious scenario is where an architecture or particular
> implementation requires alignment, but the kernel traps alignment errors and
> fixes them silently. SPARC Solaris does this and my understanding is that it
> introduces a very significant performance overhead;

Linux also does this. Actually there's plenty of hardware, where this is
an issue. Almost all "RISC" chips, especially embedded ones do have
alignment restrictions to some degree. I know older PPC and recent Power
chips having them as well. And even the fastest CPUs have some performance
penalty when doing unaligned accessess even if the hardware solves it
itself, and it doesn't involve the software side.

> ARM Linux may also do it (where demanded by the hardware) but my
> understanding is that notifications can be enabled.

Yes, we have these notifications enabled, and we're flooded with them,
when using the cmem allocator. This is why I started working on this.

Charlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] CMem allocator memory alignment experiment

2014-11-19 Thread Mark Morgan Lloyd

Karoly Balogh (Charlie/SGR) wrote:

Hi,

I think there are several issues with the cmem memory allocator. The main
issue that it "breaks" the underlying malloc() memory alignment, by adding
a four/eight byte size value to the start of each block for the sole
reason to be able to throw Runtime Error 204 in case someone tries to free
a block with the wrong size.

At least on Linux, malloc() is documented to align to 64 bit on 32 bit and
128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that
to 4 bytes and 8 bytes, respectively.


Since cmem is intended for use by FPC, I don't see this as a serious 
issue unless somebody is exchanging malloc()ed blocks between Pascal and 
C code. However I'm not saying that it's not worth fixing.



This causes multiple performance and other issues, especially on
processors which require stricter alignment (most ARM CPUs, but also x86
with SSE, etc).


I'm not sure to what extent this remains an issue with current ARM 
chips. I've got limited ARM hardware, but some tests that I did with 
somebody else a few months ago didn't show up any issues.


It's more of a problem with SPARC particularly on Linux, but that's 
rapidly going down the tubes as a viable platform- in part because this 
very issue breaks a lot of stuff and maintainers have neither hardware 
nor incentive to investigate.


Perhaps the most serious scenario is where an architecture or particular 
implementation requires alignment, but the kernel traps alignment errors 
and fixes them silently. SPARC Solaris does this and my understanding is 
that it introduces a very significant performance overhead; ARM Linux 
may also do it (where demanded by the hardware) but my understanding is 
that notifications can be enabled.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] CMem allocator memory alignment experiment

2014-11-18 Thread Karoly Balogh (Charlie/SGR)
Hi,

I think there are several issues with the cmem memory allocator. The main
issue that it "breaks" the underlying malloc() memory alignment, by adding
a four/eight byte size value to the start of each block for the sole
reason to be able to throw Runtime Error 204 in case someone tries to free
a block with the wrong size.

At least on Linux, malloc() is documented to align to 64 bit on 32 bit and
128 bit on 64 bit platforms, while this way cmem's GetMem() reduces that
to 4 bytes and 8 bytes, respectively.

This causes multiple performance and other issues, especially on
processors which require stricter alignment (most ARM CPUs, but also x86
with SSE, etc).

I created a cmem variant, which does 16 byte alignment of the returned
memory blocks, just like FPC's own Heap Manager does:

https://gist.github.com/chainq/6f69a7821cfa2503962f

However, when I build FPC with this cmem16 allocator, the compiler
explodes. Also it fails with other larger parts of code, and I'm unsure
why, I spent a few days debugging, but I couldn't find the issue. Ideas?

I wanted to contribute the code to the FPC SVN (after some cleanup) but
because of these issues, I couldn't.

Yes, the current alignment code is not the most optimal and wastes some
memory, but at least it should work. Must be something trivial. Ideas,
opinions, suggestions welcomed.

Charlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel