Re: [fpc-devel] Extended type

2011-04-21 Thread Micha Nelissen

Florian Klaempfl wrote:

Am 19.04.2011 15:18, schrieb Marco van de Voort:
You'll need to runtime test for SSE3 though. Since the first 
generation of

athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3.


For such a relatively expensive operations, one runtime check per 
function is imo ok even more since it is predicted perfectly after the 
first run.


If the branch history table does not overflow ;-)

Micha

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-21 Thread Florian Klämpfl
Am 21.04.2011 21:14, schrieb Micha Nelissen:
 Florian Klaempfl wrote:
 Am 19.04.2011 15:18, schrieb Marco van de Voort:
 You'll need to runtime test for SSE3 though. Since the first
 generation of
 athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3.

 For such a relatively expensive operations, one runtime check per
 function is imo ok even more since it is predicted perfectly after the
 first run.
 
 If the branch history table does not overflow ;-)

If the prediction is thrown out, then the function has no significant
part of program execution time. Even more, most CPU have today SSE3, so
the code can take this into account and allow proper static prediction.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-20 Thread Florian Klämpfl
Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich:
 Florian Klaempfl schrieb:
 
 Using extended typically hides only bad numerical algorithms. There
 might be some corner cases where extended is usefull but I general I
 think it's a matter of bad algorithms.
 
 Some algorithms convert faster with increased accuracy.

I guess you meant converge? This might be true, but processing of
extended types is also slower: the memory footprint increases and even
worse, extended arrays are typically aligned to 4 or even 16 byte
borders so they take 12 or 16 byte in memory. Further, more complex
floating point operations than +,-,* are also typically slower when the
fpu is set to extended precision. So even if an algorithm converges in
less steps with extended, the overall computation time might not decrease.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-20 Thread Florian Klämpfl
Am 20.04.2011 11:26, schrieb Michael Schnell:
 On 04/19/2011 03:14 PM, Florian Klaempfl wrote:
 Using extended typically hides only bad numerical algorithms. There
 might be some corner cases where extended is usefull but I general I
 think it's a matter of bad algorithms.
 Doing things like Matrix inversion of course is a good example that a
 better algorithm helps more that increasing the numeric resolution. But
 OTOH, when the algorithm is perfect, increased resolution still will
 give better results.

As said in my other answer, it is quite likely that using extended
precision increases also computation time, so better result is relative.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-20 Thread Hans-Peter Diettrich

Florian Klämpfl schrieb:

Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich:

Florian Klaempfl schrieb:


Using extended typically hides only bad numerical algorithms. There
might be some corner cases where extended is usefull but I general I
think it's a matter of bad algorithms.

Some algorithms convert faster with increased accuracy.


I guess you meant converge?


Right. I had a phone call just while answering :-(


This might be true, but processing of
extended types is also slower: the memory footprint increases and even
worse, extended arrays are typically aligned to 4 or even 16 byte
borders so they take 12 or 16 byte in memory.


Please don't mix up the internal processing and the external storage of 
the values. Type coercion (expansion) is frequently used in the 
evaluation of expressions, be inside a GPU or FPU.



Further, more complex
floating point operations than +,-,* are also typically slower when the
fpu is set to extended precision. So even if an algorithm converges in
less steps with extended, the overall computation time might not decrease.


In contrast computations with extended precision can eliminate the need 
for additional checks of intermediate results (overflow, underflow...), 
which have to be inserted explicitly in other cases.


Of course there exists no general rule, it depends on the concrete 
purpose of a calculation, which algorithm, precision and type (BCD, 
fixed point...) yields the best results. But there also exists no 
reason why a coder should be prevented from using existing instructions 
and data types.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-20 Thread Florian Klämpfl
Am 20.04.2011 15:04, schrieb Hans-Peter Diettrich:
 Florian Klämpfl schrieb:
 Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich:
 Florian Klaempfl schrieb:

 Using extended typically hides only bad numerical algorithms. There
 might be some corner cases where extended is usefull but I general I
 think it's a matter of bad algorithms.
 Some algorithms convert faster with increased accuracy.

 I guess you meant converge?
 
 Right. I had a phone call just while answering :-(
 
 This might be true, but processing of
 extended types is also slower: the memory footprint increases and even
 worse, extended arrays are typically aligned to 4 or even 16 byte
 borders so they take 12 or 16 byte in memory.
 
 Please don't mix up the internal processing and the external storage of
 the values. Type coercion (expansion) is frequently used in the
 evaluation of expressions, be inside a GPU or FPU.

Actually, this is even another problem of the x87 fpu: expressions are
often evaluted more precisely than required thus resulting in
unpredicatable results because it depends on the compiler if it stores a
temp. value during evaluation of an expression in memory or not. The
correct solution to round after each operation is too slow, everything
else is unpredictable.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-20 Thread Daniël Mantione



Op Wed, 20 Apr 2011, schreef Hans-Peter Diettrich:

Of course there exists no general rule, it depends on the concrete purpose of 
a calculation, which algorithm, precision and type (BCD, fixed point...) 
yields the best results. But there also exists no reason why a coder should 
be prevented from using existing instructions and data types.


Well... I actually believe compilers should support extended precision. I 
frequenly get Fortran programs that I need to benchmark that use the 
REAL*10 type.


Do those programmers have good reasons for using REAL*10? Probably not. 
They use best precision by default. They code in Fortran because of this 
kind of support. No, not GNU Fortran, it doesn't support REAL*10, so I 
need to use the expensive commercial compilers. They don't care, they 
don't pay for it.


Is it slow? Yes. Do they care? Sometimes. But... parallelizing over 256 
cores gives more benefit than using fast double precisions. They start 
asking government subsidies for the next big supercomputer for the sake of 
promoting science. That's what your tax money goes to.


Shake your head... It's stupid, I'm doing that for a few years already. 
But the solution is not to remove extended support from the compiler. 
Users will walk away.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Alexander Klenin
2011/4/19 Nikolai Zhubr n-a-zh...@yandex.ru:
 Now, with the
 introduction of 64-bit processors IIRC AMD took care of this problem by
 providing some means to execute floating-point operations without the need
 for traditional FPU register space, thus allowing to avoid the need to
 save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on
 earlier CPUs.

Very interesting -- can you provide further detail on this?
I could not find anything relevant neither in vol.1 ch.6 nor vol.5 ch.2 of
AMD's APM -- is there something I overlooked?


-- 
Alexander S. Klenin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Jonas Maebe


On 19 Apr 2011, at 11:43, Alexander Klenin wrote:


2011/4/19 Nikolai Zhubr n-a-zh...@yandex.ru:

Now, with the
introduction of 64-bit processors IIRC AMD took care of this  
problem by
providing some means to execute floating-point operations without  
the need
for traditional FPU register space, thus allowing to avoid the need  
to
save/restore FPU state. IIRC these are some _new_ opcodes,  
unavailable on

earlier CPUs.


Very interesting -- can you provide further detail on this?
I could not find anything relevant neither in vol.1 ch.6 nor vol.5  
ch.2 of

AMD's APM -- is there something I overlooked?


There are no really new instructions for floating point. However,  
x86-64 mandates at least SSE2 (while x86 does not), which in turn  
supports 64 bit floating point math.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Nikolai Zhubr

19.04.2011 13:43, Alexander Klenin:

2011/4/19 Nikolai Zhubrn-a-zh...@yandex.ru:

Now, with the
introduction of 64-bit processors IIRC AMD took care of this problem by
providing some means to execute floating-point operations without the need
for traditional FPU register space, thus allowing to avoid the need to
save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on
earlier CPUs.


Very interesting -- can you provide further detail on this?
I could not find anything relevant neither in vol.1 ch.6 nor vol.5 ch.2 of
AMD's APM -- is there something I overlooked?


Sorry, I looked into it several years ago, I don't have any links by 
hand anymore.
However, Jonas seem to be more exact on this. I think he is right and 
AMD just pushed deprecation of x87 in favour of SSE(2).


Nikolai





___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Florian Klämpfl
Am 19.04.2011 12:12, schrieb Daniël Mantione:
 
 
 Op Tue, 19 Apr 2011, schreef Nikolai Zhubr:
 
 ms (supposedly) decided to just not preserve FPU/MMX state between
 64-bit processes.
 
 MS does preserve FPU states between processes. You can use the x87 on
 Windows, nothing prevents you from doing so. Maybe the calling
 convention, but even that you can extend with x87.

FPC still uses the x87 FPU for trig. functions on Win64.

 
 It's just that the documentation tells you not to use the x87.

Yes, because it's strange programming model should be really dropped.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Nikolai Zhubr

19.04.2011 14:12, Daniël Mantione:


MS does preserve FPU states between processes. You can use the x87 on
Windows, nothing prevents you from doing so. Maybe the calling


Yes it does for 32-bit processes on win64, guaranteed.
But do you have any evidence (tests/documents/links) proving it also 
does so for 64-bit processes on win64?



convention, but even that you can extend with x87.

It's just that the documentation tells you not to use the x87.

Daniël



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Hans-Peter Diettrich

Nikolai Zhubr schrieb:


Originally MS spread info it wouldn't work at all under Windows, but
that proved to be false,
the FPU works technically. Now MS just states it is unsupported.


And deprecated:
http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit 





Thanks. I always knew that Windows is not an OS for serious work, but I
never heard that from Microsoft so clearly :-(


Not being an ms fan whatsoever, but you all seem to have missed the 
technical point here.


Because x87 (and also MMX in some sense) is a coprocessor (and has its 
own register space) its full state has to be saved/restored (by an OS) 
between different running processes in case any process might use 
fpu/mmx.


The same applies to the XMM/YMM registers. While dropping MMX support is 
acceptable, in favor of the new vector arithmetic instruction set, I see 
no point in dropping 80 bit reals before a new 128 bit arithmetic 
becomes available.


Clearly this may become rather inefficient performance-wise 
(because, well, an application might just want to use 2 fpu registers at 
a time, and OS will still have to store the whole bunch all the time...) 
Now, with the introduction of 64-bit processors IIRC AMD took care of 
this problem by providing some means to execute floating-point 
operations without the need for traditional FPU register space, thus 
allowing to avoid the need to save/restore FPU state. IIRC these are 
some _new_ opcodes, unavailable on earlier CPUs.


When AMD aliased the FPU and MMX registers, I don't understand why they 
*added* new XMM registers, instead of extending the already existing MMX 
registers - just for fast switching. But it is as it is...


So, for performance reasons, and because 64-bit applications (are now 
supposed to be) able to do all floating-point without touching the 
traditional FPU, ms (supposedly) decided to just not preserve FPU/MMX 
state between 64-bit processes. Thats all. IMHO is makes some sense 
actually, though it would be much nicer if there was some option to 
select this deliberately (say at boot time or whatever).


At least an application should have a chance to specify, which register 
sets have to be saved on an task switch. Unless stated otherwise by MS, 
the entire state should be saved, as long as x87/MMX is only deprecated, 
not dropped. Any official information on this issue?


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Daniël Mantione



Op Tue, 19 Apr 2011, schreef Nikolai Zhubr:


19.04.2011 14:12, Daniël Mantione:


MS does preserve FPU states between processes. You can use the x87 on
Windows, nothing prevents you from doing so. Maybe the calling


Yes it does for 32-bit processes on win64, guaranteed.
But do you have any evidence (tests/documents/links) proving it also does so 
for 64-bit processes on win64?


Not at hand, but don't worry, it does preserve FPU states.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Daniël Mantione



Op Tue, 19 Apr 2011, schreef Florian Klämpfl:


It's just that the documentation tells you not to use the x87.


Yes, because it's strange programming model should be really dropped.


Agree, but the 80 bit support makes some people want to use it. And that 
will stay this way until CPU manufacturers invent an alternative.


By the way, recent GCC versions calculate the goniometric functions in 
software using SSE3, and I checked that this is indeed slightly faster 
than the x87. So we can get rid to the x87 stuff, should we want.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Florian Klaempfl

Am 19.04.2011 12:27, schrieb Daniël Mantione:



Op Tue, 19 Apr 2011, schreef Florian Klämpfl:


It's just that the documentation tells you not to use the x87.


Yes, because it's strange programming model should be really dropped.


Agree, but the 80 bit support makes some people want to use it. And that
will stay this way until CPU manufacturers invent an alternative.


Using extended typically hides only bad numerical algorithms. There 
might be some corner cases where extended is usefull but I general I 
think it's a matter of bad algorithms.




By the way, recent GCC versions calculate the goniometric functions in
software using SSE3, and I checked that this is indeed slightly faster
than the x87.


I know but as usual, time etc ;)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
 
 By the way, recent GCC versions calculate the goniometric functions in 
 software using SSE3, and I checked that this is indeed slightly faster 
 than the x87. So we can get rid to the x87 stuff, should we want.

You'll need to runtime test for SSE3 though. Since the first generation of
athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3.

I checked and 64-bit Pentium-D's do have SSE3, at least mine does:

CPU:
Intel(R) Pentium(R) D CPU 2.80GHz (2793.02-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf47  Family = f  Model = 4  Stepping = 7

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Florian Klaempfl

Am 19.04.2011 15:18, schrieb Marco van de Voort:

In our previous episode, Dani?l Mantione said:


By the way, recent GCC versions calculate the goniometric functions in
software using SSE3, and I checked that this is indeed slightly faster
than the x87. So we can get rid to the x87 stuff, should we want.


You'll need to runtime test for SSE3 though. Since the first generation of
athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3.


For such a relatively expensive operations, one runtime check per 
function is imo ok even more since it is predicted perfectly after the 
first run.




I checked and 64-bit Pentium-D's do have SSE3, at least mine does:

CPU:
Intel(R) Pentium(R) D CPU 2.80GHz (2793.02-MHz K8-class CPU)
   Origin = GenuineIntel  Id = 0xf47  Family = f  Model = 4  Stepping = 7

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR
   AMD Features=0x20100800SYSCALL,NX,LM
   AMD Features2=0x1LAHF
   TSC: P-state invariant
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-19 Thread Hans-Peter Diettrich

Florian Klaempfl schrieb:

Using extended typically hides only bad numerical algorithms. There 
might be some corner cases where extended is usefull but I general I 
think it's a matter of bad algorithms.


Some algorithms convert faster with increased accuracy.

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-18 Thread Daniël Mantione



Op Mon, 18 Apr 2011, schreef Hans-Peter Diettrich:


Sven Barth schrieb:

On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it 
so.


Can you be a bit more concrete?


Originally MS spread info it wouldn't work at all under Windows, but that 
proved to be false, the FPU works technically. Now MS just states it is 
unsupported.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-18 Thread Jonas Maebe


On 18 Apr 2011, at 10:13, Daniël Mantione wrote:

Originally MS spread info it wouldn't work at all under Windows, but  
that proved to be false,

the FPU works technically. Now MS just states it is unsupported.


And deprecated: 
http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit

The x87, MMX, and 3DNow! instruction sets are deprecated in 64-bit  
modes. The instructions sets are still present for backward  
compatibility for 32-bit mode; however, to avoid compatibility issues  
in the future, their use in current and future projects is discouraged.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-18 Thread Hans-Peter Diettrich

Jonas Maebe schrieb:


On 18 Apr 2011, at 10:13, Daniël Mantione wrote:

Originally MS spread info it wouldn't work at all under Windows, but 
that proved to be false,

the FPU works technically. Now MS just states it is unsupported.


And deprecated: 
http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit


Thanks. I always knew that Windows is not an OS for serious work, but I 
never heard that from Microsoft so clearly :-(


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Extended type

2011-04-17 Thread Hans-Peter Diettrich
Some time ago I've heard a rumor that the Extended type is not supported 
by x86_64 targets. But AFAIK the x87 FPU continues to exist in 64 bit 
machines, and is still accessible by the well known coprocessor 
instruction set.


So what's the current state of floating point types in FPC?

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-17 Thread Sven Barth

On 17.04.2011 19:30, Hans-Peter Diettrich wrote:

Some time ago I've heard a rumor that the Extended type is not supported
by x86_64 targets. But AFAIK the x87 FPU continues to exist in 64 bit
machines, and is still accessible by the well known coprocessor
instruction set.

So what's the current state of floating point types in FPC?


On Windows 64-bit you must not use the x87 FPU, because Microsoft wants 
it so. Thus on Win64 Extended=Double.


On other x86_64 based operating systems the state might be different.

Other CPUs don't even have a coprocessor or only a vendorspecific one 
(like some ARMs) and thus there's also the rule Extended=Double.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-17 Thread Daniël Mantione



Op Sun, 17 Apr 2011, schreef Sven Barth:

On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it 
so. Thus on Win64 Extended=Double.


On other x86_64 based operating systems the state might be different.


You can use the x87 on Linux. Don't know for FreeBSD, but I expect yes, 
since it uses the same calling conventions as Linux and x87 is part of 
those conventions.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Extended type

2011-04-17 Thread Hans-Peter Diettrich

Sven Barth schrieb:

On Windows 64-bit you must not use the x87 FPU, because Microsoft wants 
it so.


Can you be a bit more concrete?

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel