Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
In the meantime, if everything seems present and correct, 
https://bugs.freepascal.org/view.php?id=36202 contains the alignment and 
vectorcall modifiers for uComplex.  It shouldn't affect anything outside 
of x86_64 but should still keep the unit very lightweight, which I 
believe was the original intent.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
Hmmm, that is unfortunate if the horizontal operations are inefficient.  
I had a look at them at 
https://www.agner.org/optimize/instruction_tables.pdf - you are right in 
that HADDPS has a surprisingly high latency (approximately how many 
cycles it takes to execute), although HADDPD isn't as bad, probably 
because it's only dealing with 2 Doubles instead of 4 Singles, and it 
seems mostly equivalent in speed to the multiplication instructions.


Using just SSE2:

mulpd %xmm0,%xmm0
shufpd %xmm0,%xmm1,1
addsd %xmm1,%xmm0
sqrtsd %xmm0,%xmm0

Ultimately it's not much better than what you have:

shufpd %xmm0,%xmm1,1 { Only needed if both fields are in %xmm0 }
mulsd %xmm0,%xmm0
mulsd %xmm1,%xmm1
addsd %xmm1,%xmm0
sqrtsd %xmm0,%xmm0

If you measure the dependencies between the instructions (shufpd and the 
first mulsd can run simultaneously, or equivalently, the two mulsd 
instructions), it still amounts to 4 cycles, assuming each instruction 
takes an equal amount of time to execute (which they don't, but it's a 
reasonable approximation).  The subroutines are also probably too small 
to get accurate timing metrics on them.  It might be something to 
experiment on though - I would hope at the very least that the 
horizontal operations have improved in later years.


I know though that vectorising instructions is, by and large, a net 
gain.  For example, let's go to a simpler example of adding two complex 
numbers together:


  operator + (z1, z2 : complex) z : complex; vectorcall;
  {$ifdef TEST_INLINE}
  inline;
  {$endif TEST_INLINE}
    { addition : z := z1 + z2 }
    begin
   z.re := z1.re + z2.re;
   z.im := z1.im + z2.im;
    end;

No horizonal adds here, just a simple packed addition and storing the 
result into %xmm0 as opposed to two scalar additions and then combining 
the result in whatever way is demanded (if aligned, it's all in %xmm0, 
if unaligned, I think %xmm0 and %xmm1 are supposed to be used).  Mind 
you, in this case the function is inlined, so the parameter passing 
doesn't always apply.


Once again though, I was surprised at how inefficient HADDPS is once you 
pointed it out.  The double-precision versions aren't nearly as bad 
though, so maybe they can still be used.


Gareth aka. Kit

P.S. As far as 128-bit aligned vector types are concerned, vectorcall 
and the System V ABI can be considered equivalent. Vectorcall can use 
more MM registers for return values and more complex aggregates as 
parameters, but in our examples, we don't have to worry about that yet.



On 23/10/2019 21:20, Florian Klämpfl wrote:

Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:


mulpd    %xmm0, %xmm0 { Calculates "re * re" and "im * im" 
simultaneously }
haddpd    %xmm0, %xmm0 { Adds the above multiplications together 
(horizontal add) }


Unfortunatly, those horizontal operations are normally not very 
efficient IIRC.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
So I did a bit of reading after finding the "mpx-linux64-abi.pdf" 
document.  As I suspected, the System V ABI is like vectorcall when it 
comes to using the XMM registers... only the types __m128, __float128 
and __Decimal128 use the "SSEUP" class and hence use the entire 
register.  The types are opaque, but both their size and alignment are 
16 bytes, so I think anything that abides by those rules can be 
considered equivalent.


If the complex type is unaligned, the two fields get their own XMM 
register.  If aligned, they both go into %xmm0.  At least that is what I 
gathered from reading the document - it's a little unclear sometimes.


Gareth aka. Kit

On 23/10/2019 06:59, Florian Klämpfl wrote:

Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" 
:


That's definitely a marked improvement.  Under the System V ABI and
vectorcall, both fields of a complex type would be passed through xmm0.
Splitting it up into two separate registers would require something like:


shufpd%xmm0,%xmm1,3 { Copy the high-order Double into the low-order
position - an immediate operand of "1" will also work, since we're not
concerned with the upper 64 bits of %xmm1 }


After which your complied code will work correctly (since it looks like
%xmm1 was undefined before):

The code is correct, on x86_64-linux vectorcall is ignored. Supporting 
vectorcall with my approach would be more difficult.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl

Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:


mulpd    %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously }
haddpd    %xmm0, %xmm0 { Adds the above multiplications together 
(horizontal add) }


Unfortunatly, those horizontal operations are normally not very 
efficient IIRC.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride

2019-10-23 Thread Martin Frb

On 23/10/19 11:34, Joost van der Sluis wrote:

Op 13-10-2019 om 00:22 schreef Martin:

- ShortString are encoded exactly as
record  len: int;   st: array of char;  end;
- And in dwarf 3, ansistring are encoded as array. .
Well. If someone creates a record called 'ShortString', (s)he should 
not be surprised that the debugger thinks that it is actually a 
shortstring?


I do not see the issue here. The compiler generates debug-information 
that makes it possible for any debugger to show the data correctly. 
For shortstrings it repors a structure with a length and the actual 
characters. This is what a shortstring is..


A user may copy a watch expression from its source (when using mouse 
hint, but also instead of typing a copy to the watch window). That may be

   FooString[5]

If shortstring is a record (gdb without help from the IDE) then the 
watch fails, because it must be FooString.st[5]


Of course this is not a problem in the IDE since the IDE can change it, 
and may even do so for a userdefined record.

Though FooString [0] should only work for shortstrings. (to get the len)

The problem is more severe in cases (dwarf2) where ansistring and pchar 
are indistinguishable. Because s[1] can be the 1st or 2nd char And 
that means WRONG data can be displayed.


In dwarf3 it is array of char vs ansistring.
And while ansistring should usually display utf8 as text, array of char 
is on "8bit" char at a time. (IMHO)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride

2019-10-23 Thread Martin Frb

On 23/10/19 11:34, Joost van der Sluis wrote:

Op 13-10-2019 om 00:22 schreef Martin:
I have a few wishes with regards to: 
https://bugs.freepascal.org/view.php?id=36144


1)
FpDebug detects fpc as dwarf provider, and checks the fpc version. 
Based on this it can interpret the misplaced tags, and work around 
the issue.
I have now configured 3.3.0 as the cut-off for the workaround (since 
fpc now puts the tags in the correct location).


Should this patch get merged to 3.2 then please alert me of this. So 
I can adjust the check in FpDebug.


About the patch: I made the original change within the period that I 
tried to get multiple-dimensional arrays to be displayed correctly in 
gdb. Especially arrays of ansistrings. In the end this worked, does 
this still work?


We need some kind of debug-tests. I know you have some, and that there 
are huge differences between gdb versions, but still...


I am away this week, I check next week.





2)
"I shot myself in the foot"
Having reported this issue, and it no being fixed, I realized that I 
(ab)used the presence of this issue.


- ShortString are encoded exactly as
record  len: int;   st: array of char;  end;
- And in dwarf 3, ansistring are encoded as array.

With the only difference that they always had the stride in the 
array, and not in the range.


FpDebug used the knowledge of this implementation detail (in the hope 
that it would not change) to detect the diff between a user defined 
record (with the exact same fieldnames), and an actual shortstring.

That no longer works


Well. If someone creates a record called 'ShortString', (s)he should 
not be surprised that the debugger thinks that it is actually a 
shortstring?


I do not see the issue here. The compiler generates debug-information 
that makes it possible for any debugger to show the data correctly. 
For shortstrings it repors a structure with a length and the actual 
characters. This is what a shortstring is.



Yes it is correct right now.

But it might be possible to improve, since dwarf has a string type, and 
all we (or I) need to test is if gdb (nowadays) can display it.


shortstring might still be a record, though...

Btw shortstring has 2 length...
type s = string[20];

can hold 20 chars (important if the debugger wants to change the value)
but
  s:='abc'
sets the length (s[0]) to 3





So I need a new difference, please.


Adding an artificial difference (some sort of implementation-detail) 
on which some debuggers depend, does not seem to be a good idea, imho.



true, I have one for now ...


Ideally using DW_TAG_string_type (available since dwarf 2).
I have currently no idea what gdb will do with that.
For FpDebug I will have to implement it, but that is no problem.


DW_TAG_string_type is deliberately not used, as it describes a 
string-type that fpc does not use.



could you explain?



If the stride is optional, the compiler should always omit it when not 
necessary, to decrease the executable (debuginfo) size. What you ask 
is to add a bug deliberately, which you can use to detect whether 
something is a string or not. 

True. I am not fond of implementation detail stuff

But I would very much like a definite specification

fpc has -godwarfcpp
it could have godwarffpd
and then use vendor tags, to describe any pascal type that has no exact 
dwarf spec.



fpdebug knows the fpc version that wrote the dwarf. so for older fpc it 
can use the implementation details (they do not change). And if we can 
start and work out proper details for the future then that will solve it.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride

2019-10-23 Thread Joost van der Sluis

Op 13-10-2019 om 00:22 schreef Martin:
I have a few wishes with regards to: 
https://bugs.freepascal.org/view.php?id=36144


1)
FpDebug detects fpc as dwarf provider, and checks the fpc version. Based 
on this it can interpret the misplaced tags, and work around the issue.
I have now configured 3.3.0 as the cut-off for the workaround (since fpc 
now puts the tags in the correct location).


Should this patch get merged to 3.2 then please alert me of this. So I 
can adjust the check in FpDebug.


About the patch: I made the original change within the period that I 
tried to get multiple-dimensional arrays to be displayed correctly in 
gdb. Especially arrays of ansistrings. In the end this worked, does this 
still work?


We need some kind of debug-tests. I know you have some, and that there 
are huge differences between gdb versions, but still...



2)
"I shot myself in the foot"
Having reported this issue, and it no being fixed, I realized that I 
(ab)used the presence of this issue.


- ShortString are encoded exactly as
record  len: int;   st: array of char;  end;
- And in dwarf 3, ansistring are encoded as array.

With the only difference that they always had the stride in the array, 
and not in the range.


FpDebug used the knowledge of this implementation detail (in the hope 
that it would not change) to detect the diff between a user defined 
record (with the exact same fieldnames), and an actual shortstring.

That no longer works


Well. If someone creates a record called 'ShortString', (s)he should not 
be surprised that the debugger thinks that it is actually a shortstring?


I do not see the issue here. The compiler generates debug-information 
that makes it possible for any debugger to show the data correctly. For 
shortstrings it repors a structure with a length and the actual 
characters. This is what a shortstring is.


That some debuggers, specially made for fpc (like fpdebug, but in some 
regard this holds for gdb too) show a more convenient format, is nice. 
But I think this is not relevant for the compiler.



So I need a new difference, please.


Adding an artificial difference (some sort of implementation-detail) on 
which some debuggers depend, does not seem to be a good idea, imho.



Ideally using DW_TAG_string_type (available since dwarf 2).
I have currently no idea what gdb will do with that.
For FpDebug I will have to implement it, but that is no problem.


DW_TAG_string_type is deliberately not used, as it describes a 
string-type that fpc does not use.


If that is not an option, can we go for a simpler (implementation detail 
(yes again)) workaround (that then goes into trunk, and/or if the 
original fix is merged, can be merged too):
- The stride is optional. If absent it is equal to the element size 
(shortstring = char = byte)

- Arrays always have a stride
- Drop it from the strings array
And I can then detect that.


If the stride is optional, the compiler should always omit it when not 
necessary, to decrease the executable (debuginfo) size. What you ask is 
to add a bug deliberately, which you can use to detect whether something 
is a string or not.


Regards,

Joost.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl
Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" 
:

> That's definitely a marked improvement.  Under the System V ABI and
> vectorcall, both fields of a complex type would be passed through xmm0.
> Splitting it up into two separate registers would require something like:
>
>
> shufpd%xmm0,%xmm1,3 { Copy the high-order Double into the low-order
> position - an immediate operand of "1" will also work, since we're not
> concerned with the upper 64 bits of %xmm1 }
>
>
> After which your complied code will work correctly (since it looks like
> %xmm1 was undefined before):

The code is correct, on x86_64-linux vectorcall is ignored. Supporting 
vectorcall with my approach would be more difficult.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel