Re: [fpc-devel] 034881: Request debug info for SEH (finally/except) to prevent regression when win32 switches to SEH

2019-10-29 Thread Sven Barth via fpc-devel
Martin Frb  schrieb am Di., 29. Okt. 2019, 20:08:

> About https://bugs.freepascal.org/view.php?id=34881
>
> First of all, big thanks to Sven for the patch.
> I had a look at it, I also looked through the alternatives again.
>
> First of all the patch would need some tweaking (but that is to be
> expected), but I am not sure it is the best way to go.
>
> Under gdb the issue is that there seems no way to access the data (added
> by the patch). And gdb itself does not seem to use it either (it does
> read the entries, but does not seem to further access them).
> Gdb actually does the same as the IDE does in none seh code. gdb sets
> breakpoints at "__cxa_begin_catch".
>

When implementing the patch I saw a bug report on GDB regarding the
exception blocks (that's how I learned that ICC generates them), so it
seems that it does handle them in some case... (at least to get the C++
exception variable that has been caught).

*** Other OS ??
> Are there any plans for such exception handling?
>

FPC trunk now supports PSABIEH on i386 and x86_64 linux. You need to
recompile the compiler with -dPSABIEH as well as the library path set to
wherever libgcc.a resides (of course RTL and packages need to be recompiled
as well).


> ==
> About the patch. ...
> The need to address the issues below, depends on the outcome of the
> above
>
> 1)
> program Project1;
> begin
>try
>  try
>writeln;
>  except
>writeln;
>  end;
>except
>  writeln;
>end;
> end.
>
> gives
> project1.lpr(2,6) Error: Internal error 2019102101
>

Yes, I didn't test nested blocks, because I wasn't yet sure whether there
debug information should be siblings or nested as well.


> 2)
> The tag for a finally block point to the code in the function containing
> the try/finally. That is the asm instruction " call fin$1"
> However, in case of an exception this code is not executed. fin$1 is
> called from __FPC_specific_handler. So those addresses do not actually
> help.
>
> The dwarf spec is not too specific, but I am not sure how good an idea
> it is to have the "catch" address range in a different function.
>

Yeah, I noticed that as well, but at least it could be used to set a
breakpoint at the called function inside that range...


> 
> Something else / Stepping
> Because finally is a subroutine, "step over / F8" does not enter it.
> That is not what the user expects.
>
> In FpDebug that can be solved.
> With GDB, that would require a lot of work, and probably slow down
> stepping quite a bit
>
> However gdb does check the function after a call statement.
> program Project1;
> label a,b;
> begin
>writeln;
>asm
>call b
>jmp a
> b:
>nop
>nop
>ret
> a:
>end;
>writeln;
> end.
>
> And F8 will step into b (gdb 8.x). Because b is the same function. (and
> apparently gdb does some checks to distinguish this from a recursive call)
>
> So if fpc would write dwarf info where the function includes the finally
> code, then stepping would work.
> Though with the finally code currently being in front of the function
> body, it would need an entry point. (not tested...). Or the finally code
> could be moved to the end.
>
> Only tested with 64 bit.
> I can do more tests, if it is considered worth the effort.
>

For the compiler the finally code is generated essentially as a nested
function. It does not know the concept of a function following immediately
afterwards.

So that would be a rather complex undertaking that I don't believe is worth
the effort.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Michael Van Canneyt



On Tue, 29 Oct 2019, Ben Grasset wrote:


On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt 
wrote:


Saying that the code is 'almost unusably slow' is the kind of statement
that does
not help. I use the code almost daily in production, no complaints about
performance, so clearly it is usable.

Instead, demonstrate your claim with facts, for example by creating a
patch that
demonstrably increases performance.



I was perhaps slightly exaggerating there. I use it as well in real life,
but in many cases have found myself altering the sources to perform more
optimally (some of which I could submit as patches, I suppose.


Please do.

As said, I rarely refuse patches for optimization for code I maintain,
exactly because I know I pay little attention to it.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] 034881: Request debug info for SEH (finally/except) to prevent regression when win32 switches to SEH

2019-10-29 Thread Martin Frb

About https://bugs.freepascal.org/view.php?id=34881

First of all, big thanks to Sven for the patch.
I had a look at it, I also looked through the alternatives again.

First of all the patch would need some tweaking (but that is to be 
expected), but I am not sure it is the best way to go.


Under gdb the issue is that there seems no way to access the data (added 
by the patch). And gdb itself does not seem to use it either (it does 
read the entries, but does not seem to further access them).
Gdb actually does the same as the IDE does in none seh code. gdb sets 
breakpoints at "__cxa_begin_catch".


FpDebug of course could access the data, but I am not sure it is worth 
it (At least in the immediate future).


To access the data it is needed to unroll the stack, including through 
kernel and 3rd party code. Normal stack unrolling can fail in such cases.
Of course it can be done with the help of the seh data itself. And that 
may be a path worth pursuing, even if only to get better stack traces. 
(Not sure yet what priority that may get...)


On win64 the unroll data can actually be captured by breaking on 
__FPC_specific_handler and using dispatch.ControlPc / So that might be 
easy (in fpdebug).
On win32 this is not available. There is some dispatch argument, but I 
found no info what it contains.


--
I found some alternatives. (Comments welcome)
They do depend on implementation details, but so does having a 
breakpoint at fpc_raiseexception, fpc_catch, or even 
__FPC_specific_handler.


*** Detecting on Win64
** Except
On win64, the address of except handlers can already be detected, by 
breaking on the kernel's RtlUnwindEx


** Finally
Breaking on __FPC_specific_handler the debugger can access "HandlerData" 
(a table with finally addresses for the  frame (on each call))
That works as long as those tables do not change in format. 
"HandlerData" is the only bit, that is compiler specific. The other 
structures are given by the OS.


The detected addresses can be verified, as they must have line info, and 
for finally blocks the function name must match .*fin\$.*

All this should work through gdb.

Not sure if it is worth writing dwarf info for __FPC_specific_handler  
and its arguments. Then of course the arguments format could be checked, 
if they have the correct format.

At the moment, I am quite happy to access them without dbg info.


Alternative it is possible to use dispatch.ControlPc to find the 
function in which to look for finally handlers. FpDebug could use that 
to combine it with the dwarf from the patch.
But since FpDebug knows the version of fpc that created the file, it 
also knows if it can trust the "HandlerData".



*** Detecting on Win32
The debugger can break on __FPC_finally_handler  and __FPC_except_handler
Both have frame.HandlerArg as argument which (depending on the other 
args) is the address of the next finally/except handler.


Again that relies on fpc not changing what HandlerArg contains.
But since it can be checked to be an address with line info, it should 
be use-able. (and should work through gdb)


*** Other OS ??
Are there any plans for such exception handling?

==
About the patch. ...
The need to address the issues below, depends on the outcome of the 
above


1)
program Project1;
begin
  try
    try
  writeln;
    except
  writeln;
    end;
  except
    writeln;
  end;
end.

gives
project1.lpr(2,6) Error: Internal error 2019102101

2)
The tag for a finally block point to the code in the function containing 
the try/finally. That is the asm instruction " call fin$1"
However, in case of an exception this code is not executed. fin$1 is 
called from __FPC_specific_handler. So those addresses do not actually help.


The dwarf spec is not too specific, but I am not sure how good an idea 
it is to have the "catch" address range in a different function.



Something else / Stepping
Because finally is a subroutine, "step over / F8" does not enter it. 
That is not what the user expects.


In FpDebug that can be solved.
With GDB, that would require a lot of work, and probably slow down 
stepping quite a bit


However gdb does check the function after a call statement.
program Project1;
label a,b;
begin
  writeln;
  asm
  call b
  jmp a
b:
  nop
  nop
  ret
a:
  end;
  writeln;
end.

And F8 will step into b (gdb 8.x). Because b is the same function. (and 
apparently gdb does some checks to distinguish this from a recursive call)


So if fpc would write dwarf info where the function includes the finally 
code, then stepping would work.
Though with the finally code currently being in front of the function 
body, it would need an entry point. (not tested...). Or the finally code 
could be moved to the end.


Only tested with 64 bit.
I can do more tests, if it is considered worth the effort.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org

Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Ben Grasset
On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt 
wrote:

> Saying that the code is 'almost unusably slow' is the kind of statement
> that does
> not help. I use the code almost daily in production, no complaints about
> performance, so clearly it is usable.
>
> Instead, demonstrate your claim with facts, for example by creating a
> patch that
> demonstrably increases performance.
>

I was perhaps slightly exaggerating there. I use it as well in real life,
but in many cases have found myself altering the sources to perform more
optimally (some of which I could submit as patches, I suppose.

On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt 
wrote:

> If you genuinely believe that micro-optimization changes can make a
> difference:
>
> Submit patches. When focused and well explained, I doubt they will be
> refused.
>

The stuff that I'm particularly concerned about is usually more along the
lines of "small things that add up in significant ways in the context of
long-running programs", so while they might be "micro" on their own I
wouldn't necessarily call them that in context of larger overall situations.

On Sun, Oct 27, 2019 at 5:46 AM Florian Klämpfl 
wrote:

> Another point: for example
> explicit inline increases normally code size (not always but often)


I've had the opposite experience in most cases. The code FPC generates for
something like four un-inlined functions in a situation where each one
calls the next is generally significantly bigger due to the setup for the
parameters being passed in / etc. Whereas if it's inlining all of them it
seems to be able to do a much better job of combining "redundant" things
and optimizing based on that, which tends to give a much smaller result.

Again, in a world where robust autoinlining was the default I'd happily
rely on it exclusively, as it's not as though I specifically *want* to have
to add the "inline" modifier in particular places.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread J. Gareth Moreton

On 29/10/2019 14:24, Michael Van Canneyt wrote:



On Tue, 29 Oct 2019, J. Gareth Moreton wrote:

Please note that only Marco's e-mails are making the list.  I don't 
see Michael's responses.


That's probably because I am not responding ;-)

Michael.


Yep, just noticed that Marco was responding to your messages from a few 
days ago!  Perception fail!


In regards to passing everything into XMM0, try running 
"tests/test/cg/tvectorcall1.pp" on Linux.  It's a bit of a weird test 
because there's a lot of Win64 stuff that's not compiled since it tests 
aggregates, something that only vectorcall takes advantage of.


Nevertheless, if you get an error such as 'FAIL: 
HorizontalAddSingle(HVA) has the vector in the wrong register.', then 
the System V ABI is not passing the __m128 type properly. The way it 
tests this is via a pair of functions, one in Pascal and one in assembler:


function HorizontalAddSingle(V: TM128): Single; vectorcall;
begin
  HorizontalAddSingle := V.M128_F32[0] + V.M128_F32[1] + V.M128_F32[2] 
+ V.M128_F32[3];

end;

function HorizontalAddSingle_ASM(V: TM128): Single; vectorcall; 
assembler; nostackframe;

asm
  HADDPS XMM0, XMM0
  HADDPS XMM0, XMM0
end;

If the results are not equal, then the entire vector isn't in XMM0.  I 
haven't tested it on Linux as much as I would like because I have to 
boot into a virtual machine to do so, and I'm still a bit of a Linux 
novice.  I'm curious to know what the assembler dump is though.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread J. Gareth Moreton

Oh, I just noticed you're replying to messages from a few days ago.  Oops!

There is no right way of going about optimisation.  I'm of the school 
that if you can give the compiler a helpful hint, without complicating 
the code, then do it.


In one way I compare it to the id Tech (Quake) and Unreal engines back 
in the 90s and early 2000s.  When making maps, the id Tech engines 
attempted to compile everything itself when it came to determining what 
was visible and what it should cull - as a result, the compilation 
process would take a long time and there were some situations where it 
could easily fall apart due to rounding errors or just some glitch in 
the tree.  The Unreal engine, on the other hand, had /you/, the map 
designer, decide what was visible and what wasn't, and had you decide 
where to place portals and other hints to the engine.  This was useful 
because it was much easier to subdivide areas if you were sensible about 
it and hence the Unreal engine could handle much more complex outdoor 
scenes, for example.  The cost though, especially with later versions of 
the Unreal engine that added more features, is that it was very hard for 
a novice to get started - for example, the 'terrain' feature didn't do 
any automatic visibility culling, so if you had a large hill, for 
example, you would have to insert an 'anti-portal' underneath it to give 
a hint to the engine that if it is within the viewport, any polygons 
behind it is invisible (which causes very weird artefacts if you place 
one in the middle of an open room).


I like to take a middle ground, especially as the Pascal compiler has a 
reputation of being fast.  A smart compiler is a good compiler, but 
expecting it to be able to know which procedures should be 
auto-vectorised, especially with old source code and no rules on memory 
alignment, it's either impossible or will take a disproportinately long 
time.  Other times it's an excuse for lazy programming!


As for the vectorcall tests, they should vectorise the entire argument 
on both x86_64-win64 and x86_64-linux.  If not, there's a bug 
somewhere.  I'll have a look.


Gareth aka. Kit



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Michael Van Canneyt



On Tue, 29 Oct 2019, J. Gareth Moreton wrote:

Please note that only Marco's e-mails are making the list.  I don't see 
Michael's responses.


That's probably because I am not responding ;-)

Michael.___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread J. Gareth Moreton
Please note that only Marco's e-mails are making the list.  I don't see 
Michael's responses.


Gareth aka. Kit

On 29/10/2019 13:41, Marco van de Voort wrote:


Op 2019-10-27 om 10:27 schreef Michael Van Canneyt:



Absolutely.

Personally, I don't have any concern for performance in this sense. 
Almost zero.
I invariably favour code simplicity over performance, for sake of 
maintenance.


But there is another kick-in-the-open-door statement about 
performance: That the most performance is gained in a relative small 
part of the code.


To tackle that you need tools to force the compiler to behave a 
certain way that might not (yet?) be doable on the compiler side. IMHO 
it is unfair to deem this all microoptimization just because it 
doesn't hurt you.




For good reason: for the kind of code which I create daily, the kind of
micro-optimizations that you seem to refer to, are utterly 
insignificant,
and I expect the compiler to handle them. If it currently does not, 
then I

think the compiler, rather than the code, must be improved.


Just the vectorizing will probably more than double the performance. 
Just look at the asm that I posted and imagine reducing it to one 
instruction.


And while set FFT unit is not yet a performance bottle neck  for us 
now, it has been marked as a relative large factor of the measurement 
time. (iirc it is about 1ms for a 400 sample array on somewhat older 
hardware)


And what is exactly needed might change at any given moment. If a new 
camera comes out, if processing can keep up you can process more 
samples which in turn reduces errors and improves the measurement 
nearly automatically


Doing the same purely algorithmically usually means  weeks-months of 
hard maths trying to improve signal quality, and after that validating 
that for umpteen products and customers etc etc. Believe me, 
"Microoptimization" then sounds very tempting.


If Gareth can get this running enough to show that the FFT reduces 
instructions, I can just stuff it in a DLL, and have it lying on a 
shelf to insert into the Delphi app when needed. Which would be great.


Code should not entirely disregard optimization, but then it should 
be on a

higher level: don't use bubble sort when you can use a better sort. No
amount of micro-optimization will make bubble sort outperform quickort.


(

Interesting example, I'm not really a hardcore algorithms man, but I 
can think of some potential problems with that statement:


1 that only goes for N->Infinity and that computers don't have 
infinite resources. If quicksort uses more memory (e.g. to track 
state) it might not apply in certain circumstances.


2   if your swap() function is extremely expensive, sorting an already 
sorted array is more expensive with quicksort because it is a non 
stable sort.


3 the non recursive bubble sort might be easier to unroll and then 
optimize by the compiler in cases of sorting a fixed number of items. 
(e.g. ordering the elements of a short vector)


)

Anyway, besides the fun, the "algorithms" mantra is only a first order 
guideline, not an absolute truth.


Saying that the code is 'almost unusably slow' is the kind of 
statement that does

not help. I use the code almost daily in production, no complaints about
performance, so clearly it is usable.


True. Claims should be proven, and with code that does something (not 
with simply a loop around a single operation)


But that is why I brought up the FFT unit. It is possible that that is 
such a case.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Marco van de Voort


Op 2019-10-27 om 10:27 schreef Michael Van Canneyt:



Absolutely.

Personally, I don't have any concern for performance in this sense. 
Almost zero.
I invariably favour code simplicity over performance, for sake of 
maintenance.


But there is another kick-in-the-open-door statement about performance: 
That the most performance is gained in a relative small part of the code.


To tackle that you need tools to force the compiler to behave a certain 
way that might not (yet?) be doable on the compiler side. IMHO it is 
unfair to deem this all microoptimization just because it doesn't hurt you.




For good reason: for the kind of code which I create daily, the kind of
micro-optimizations that you seem to refer to, are utterly insignificant,
and I expect the compiler to handle them. If it currently does not, 
then I

think the compiler, rather than the code, must be improved.


Just the vectorizing will probably more than double the performance. 
Just look at the asm that I posted and imagine reducing it to one 
instruction.


And while set FFT unit is not yet a performance bottle neck  for us now, 
it has been marked as a relative large factor of the measurement time. 
(iirc it is about 1ms for a 400 sample array on somewhat older hardware)


And what is exactly needed might change at any given moment. If a new 
camera comes out, if processing can keep up you can process more samples 
which in turn reduces errors and improves the measurement nearly 
automatically


Doing the same purely algorithmically usually means  weeks-months of 
hard maths trying to improve signal quality, and after that validating 
that for umpteen products and customers etc etc. Believe me, 
"Microoptimization" then sounds very tempting.


If Gareth can get this running enough to show that the FFT reduces 
instructions, I can just stuff it in a DLL, and have it lying on a shelf 
to insert into the Delphi app when needed. Which would be great.


Code should not entirely disregard optimization, but then it should be 
on a

higher level: don't use bubble sort when you can use a better sort. No
amount of micro-optimization will make bubble sort outperform quickort.


(

Interesting example, I'm not really a hardcore algorithms man, but I can 
think of some potential problems with that statement:


1 that only goes for N->Infinity and that computers don't have infinite 
resources. If quicksort uses more memory (e.g. to track state) it might 
not apply in certain circumstances.


2   if your swap() function is extremely expensive, sorting an already 
sorted array is more expensive with quicksort because it is a non stable 
sort.


3 the non recursive bubble sort might be easier to unroll and then 
optimize by the compiler in cases of sorting a fixed number of items. 
(e.g. ordering the elements of a short vector)


)

Anyway, besides the fun, the "algorithms" mantra is only a first order 
guideline, not an absolute truth.


Saying that the code is 'almost unusably slow' is the kind of 
statement that does

not help. I use the code almost daily in production, no complaints about
performance, so clearly it is usable.


True. Claims should be proven, and with code that does something (not 
with simply a loop around a single operation)


But that is why I brought up the FFT unit. It is possible that that is 
such a case.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Marco van de Voort


Op 2019-10-27 om 10:46 schreef Florian Klämpfl:

Am 27.10.19 um 10:27 schrieb Michael Van Canneyt:
If you genuinely believe that micro-optimization changes can make a 
difference:


Submit patches. 


As said: I am against applying them. Why? They clutter code and after 
all, they make assumptions about the current target which not might be 
always valid. And time testing them is much better spent in improving 
the compiler and then all code benefits. Another point: for example 
explicit inline increases normally code size (not always but often), 
so it is against the use of -Os. Applying inline manually on umpteen 
subroutines makes no sense. Better improve auto inlining.


Auto inlining is also no panacea.   It only works with heuristics, and 
is thus only as good as a formula of the heuristic.


Changing calling conventions, vectorizing, loops all complicates that, 
and it will never be perfect, and a change here will lead to a problem 
there etc.


If you know a routine can evaluate to one instruction in most cases, I 
don't see anything wrong with just marking it as such.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Marco van de Voort


Op 2019-10-29 om 12:23 schreef J. Gareth Moreton:
When it comes to testing vectorcall, uComplex isn't the best example 
actually because most of the operators are inlined.  There are a 
number of tests under "tests/test/cg" that test vectorcall and the 
System V ABI using a Pascal implementation of the opaque __m128 type 
(the two ABIs should behave exactly the same when dealing with simple 
vectors).


The last time I checked it didn't vector anything at all. So only the 
native vectorizing of the record of two singles would be nice.


Last time I checked in 2017, complexadd inlined looked something like this:

    leal    32(%eax),%edx
    leal    8(%eax),%ecx
    vmovss    (%ecx),%xmm0
    vaddss    (%edx),%xmm0,%xmm0
    vmovss    %xmm0,-8(%ebp)
    vmovss    4(%ecx),%xmm0
    vaddss    4(%edx),%xmm0,%xmm0
    vmovss    %xmm0,-4(%ebp)

And I realize quite some rearrangements must be done.



If anything though, the example function you gave (I'll need to 
double-check what ComplexScl does though, if it isn't a simple 
multiplication) 


It is simple multiplication of both real and imaginary with a scalar (as 
opposed to complex*complex which has more terms).


would be a pretty solid and heavy-duty test of the compiler attempting 
to vectorise the code - in an ideal world, individual calls to 
ComplexAdd and ComplexSub (which are simple + and - operations in 
uComplex) will compile into a single line of assembly language (ADDPD 
and SUBPD respectively).  Nevertheless, one could disable the inlining 
to see how well the compiler handles the function chaining, since with 
aligned data, the result from XMM0 should be easily transposed in one 
go to another XMM register if not just left alone as parameter data 
for the next function.



Yes, it is just a somewhat realworld codebase to play with. It is MPL even.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread J. Gareth Moreton
When it comes to testing vectorcall, uComplex isn't the best example 
actually because most of the operators are inlined.  There are a number 
of tests under "tests/test/cg" that test vectorcall and the System V ABI 
using a Pascal implementation of the opaque __m128 type (the two ABIs 
should behave exactly the same when dealing with simple vectors).


If anything though, the example function you gave (I'll need to 
double-check what ComplexScl does though, if it isn't a simple 
multiplication) would be a pretty solid and heavy-duty test of the 
compiler attempting to vectorise the code - in an ideal world, 
individual calls to ComplexAdd and ComplexSub (which are simple + and - 
operations in uComplex) will compile into a single line of assembly 
language (ADDPD and SUBPD respectively).  Nevertheless, one could 
disable the inlining to see how well the compiler handles the function 
chaining, since with aligned data, the result from XMM0 should be easily 
transposed in one go to another XMM register if not just left alone as 
parameter data for the next function.


Gareth aka. Kit


On 29/10/2019 11:06, Marco van de Voort wrote:


Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right.  It just seems weird because the System V ABI 
was designed from the start to use the MM registers fully, so long as 
the data is aligned.  In effect, it had vectorcall wrapped into its 
design from the start. Granted, vectorcall has some advantages and 
can deal with relatively complex aggregates that the System V ABI 
cannot handle (for example, a record type that contains a normal 
vector and information relating to bump mapping).


I just hoped that making updates to uComplex, while ensuring 
existing Pascal code still compiles, would help take advantage of 
modern ABI designs.


Is there currently any example which shows that vectorcall has any 
advantage with FPC? Else I would propose first to make FPC able to 
take advantage of it and then talk about if we really add vectorcall. 
Currently I fear, FPC gets only into trouble when using vectorcall as 
it tries first to push everything into one xmm register and then 
splits this again in the callee.


Nils Haeck's FFT unit might be interesting. (same guy as nativejpg 
unit iirc, http://www.simdesign.nl)


It is a D7 language level unit that uses a complex record and simple 
procedures as options. It should be easy to transpose to ucomplex. It 
is quite hll and switchable between single and double. (I use it in 
single mode, but to test vectorcall, obviously double mode would be 
best?)


And it has routines that do a variety of complex operations.

procedure FFT_5(var Z: array of TComplex); // usage of open array is 
to make things generic. Could be solved differently.


var
  T1, T2, T3, T4, T5: TComplex;
  M1, M2, M3, M4, M5: TComplex;
  S1, S2, S3, S4, S5: TComplex;
begin
  T1 := ComplexAdd(Z[1], Z[4]);
  T2 := ComplexAdd(Z[2], Z[3]);
  T3 := ComplexSub(Z[1], Z[4]);
  T4 := ComplexSub(Z[3], Z[2]);

  T5   := ComplexAdd(T1, T2);
  Z[0] := ComplexAdd(Z[0], T5);
  M1   := ComplexScl(c51, T5);
  M2   := ComplexScl(c52, ComplexSub(T1, T2));

  M3.Re := -c53 * (T3.Im + T4.Im);  // replace by 
i*add(t3,t4).scale(c53-i*c53) ?

  M3.Im :=  c53 * (T3.Re + T4.Re);
  M4.Re := -c54 * T4.Im;
  M4.Im :=  c54 * T4.Re;
  M5.Re := -c55 * T3.Im;
  M5.Im :=  c55 * T3.Re;

  S3 := ComplexSub(M3, M4);
  S5 := ComplexAdd(M3, M5);;
  S1 := ComplexAdd(Z[0], M1);
  S2 := ComplexAdd(S1, M2);
  S4 := ComplexSub(S1, M2);

  Z[1] := ComplexAdd(S2, S3);
  Z[2] := ComplexAdd(S4, S5);
  Z[3] := ComplexSub(S4, S5);
  Z[4] := ComplexSub(S2, S3);
end;

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-29 Thread Marco van de Voort


Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right.  It just seems weird because the System V ABI 
was designed from the start to use the MM registers fully, so long as 
the data is aligned.  In effect, it had vectorcall wrapped into its 
design from the start.  Granted, vectorcall has some advantages and 
can deal with relatively complex aggregates that the System V ABI 
cannot handle (for example, a record type that contains a normal 
vector and information relating to bump mapping).


I just hoped that making updates to uComplex, while ensuring existing 
Pascal code still compiles, would help take advantage of modern ABI 
designs.


Is there currently any example which shows that vectorcall has any 
advantage with FPC? Else I would propose first to make FPC able to 
take advantage of it and then talk about if we really add vectorcall. 
Currently I fear, FPC gets only into trouble when using vectorcall as 
it tries first to push everything into one xmm register and then 
splits this again in the callee.


Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit 
iirc, http://www.simdesign.nl)


It is a D7 language level unit that uses a complex record and simple 
procedures as options. It should be easy to transpose to ucomplex. It is 
quite hll and switchable between single and double. (I use it in single 
mode, but to test vectorcall, obviously double mode would be best?)


And it has routines that do a variety of complex operations.

procedure FFT_5(var Z: array of TComplex); // usage of open array is to 
make things generic. Could be solved differently.


var
  T1, T2, T3, T4, T5: TComplex;
  M1, M2, M3, M4, M5: TComplex;
  S1, S2, S3, S4, S5: TComplex;
begin
  T1 := ComplexAdd(Z[1], Z[4]);
  T2 := ComplexAdd(Z[2], Z[3]);
  T3 := ComplexSub(Z[1], Z[4]);
  T4 := ComplexSub(Z[3], Z[2]);

  T5   := ComplexAdd(T1, T2);
  Z[0] := ComplexAdd(Z[0], T5);
  M1   := ComplexScl(c51, T5);
  M2   := ComplexScl(c52, ComplexSub(T1, T2));

  M3.Re := -c53 * (T3.Im + T4.Im);  // replace by 
i*add(t3,t4).scale(c53-i*c53) ?

  M3.Im :=  c53 * (T3.Re + T4.Re);
  M4.Re := -c54 * T4.Im;
  M4.Im :=  c54 * T4.Re;
  M5.Re := -c55 * T3.Im;
  M5.Im :=  c55 * T3.Re;

  S3 := ComplexSub(M3, M4);
  S5 := ComplexAdd(M3, M5);;
  S1 := ComplexAdd(Z[0], M1);
  S2 := ComplexAdd(S1, M2);
  S4 := ComplexSub(S1, M2);

  Z[1] := ComplexAdd(S2, S3);
  Z[2] := ComplexAdd(S4, S5);
  Z[3] := ComplexSub(S4, S5);
  Z[4] := ComplexSub(S2, S3);
end;

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Progress on reviewing x86_64 optimizer overhaul and node semantic pass

2019-10-29 Thread J. Gareth Moreton

Thanks George,

As Florian stated, my work on the optimiser overhaul has been rejected 
because of how intertwined it is, along with some elements of 'code 
smell', but there's plenty to salvage and he hasn't slammed the door on 
the concept.  I'm still learning to use git and patches, especially with 
splitting up changes.  For example, if I have changes in separate files 
that I want to split up, how might one go about it without manually 
modiying the patch files? (As an easy example, I split up the uComplex 
patches into two... one with the alignment and vectorcall changes, and 
the other with the "const" modifier in the parameters).


I haven't heard anything regarding the node semantic pass yet.

Gareth aka. Kit

On 28/10/2019 17:29, George Bakhtadze wrote:
> Oh yeah, conflict resolution is the thing nobody really gets right, 
but TGit is

> a bit less wrong.
> I've pretty much resigned myself to Ctrl-F ">"...
I use Intellij IDEA as VCS client (both git and svn supported).
Patches, partial commits and conflicts resolution (automatic for many 
cases) are there.
Yes, it's not a git client but a fully featured IDE but as bonus, with 
a certain plugin it supports Pascal language. ;)
I hope it'll help to get these changes into codebase as patches 
improving compiled code performance are very important at least for me.

Thanks to Gareth for this work!
---
Best regard, George

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Progress on reviewing x86_64 optimizer overhaul and node semantic pass

2019-10-29 Thread George Bakhtadze
> Oh yeah, conflict resolution is the thing nobody really gets right, but TGit is> a bit less wrong.> I've pretty much resigned myself to Ctrl-F ">"... I use Intellij IDEA as VCS client (both git and svn supported).Patches, partial commits and conflicts resolution (automatic for many cases) are there. Yes, it's not a git client but a fully featured IDE but as bonus, with a certain plugin it supports Pascal language. ;) I hope it'll help to get these changes into codebase as patches improving compiled code performance are very important at least for me. Thanks to Gareth for this work! ---Best regard, George
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Attributes

2019-10-29 Thread Alfred

Hello,

Would it be possible to add a macro definition (in trunk) to indicate 
that attributes are supported ?


E.g. FPC_HAS_CUSTOMATTRIBUTES

Thanks.___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel