Re: [fpc-devel] Extended type
Florian Klaempfl wrote: Am 19.04.2011 15:18, schrieb Marco van de Voort: You'll need to runtime test for SSE3 though. Since the first generation of athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3. For such a relatively expensive operations, one runtime check per function is imo ok even more since it is predicted perfectly after the first run. If the branch history table does not overflow ;-) Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 21.04.2011 21:14, schrieb Micha Nelissen: Florian Klaempfl wrote: Am 19.04.2011 15:18, schrieb Marco van de Voort: You'll need to runtime test for SSE3 though. Since the first generation of athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3. For such a relatively expensive operations, one runtime check per function is imo ok even more since it is predicted perfectly after the first run. If the branch history table does not overflow ;-) If the prediction is thrown out, then the function has no significant part of program execution time. Even more, most CPU have today SSE3, so the code can take this into account and allow proper static prediction. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich: Florian Klaempfl schrieb: Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. Some algorithms convert faster with increased accuracy. I guess you meant converge? This might be true, but processing of extended types is also slower: the memory footprint increases and even worse, extended arrays are typically aligned to 4 or even 16 byte borders so they take 12 or 16 byte in memory. Further, more complex floating point operations than +,-,* are also typically slower when the fpu is set to extended precision. So even if an algorithm converges in less steps with extended, the overall computation time might not decrease. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 20.04.2011 11:26, schrieb Michael Schnell: On 04/19/2011 03:14 PM, Florian Klaempfl wrote: Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. Doing things like Matrix inversion of course is a good example that a better algorithm helps more that increasing the numeric resolution. But OTOH, when the algorithm is perfect, increased resolution still will give better results. As said in my other answer, it is quite likely that using extended precision increases also computation time, so better result is relative. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Florian Klämpfl schrieb: Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich: Florian Klaempfl schrieb: Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. Some algorithms convert faster with increased accuracy. I guess you meant converge? Right. I had a phone call just while answering :-( This might be true, but processing of extended types is also slower: the memory footprint increases and even worse, extended arrays are typically aligned to 4 or even 16 byte borders so they take 12 or 16 byte in memory. Please don't mix up the internal processing and the external storage of the values. Type coercion (expansion) is frequently used in the evaluation of expressions, be inside a GPU or FPU. Further, more complex floating point operations than +,-,* are also typically slower when the fpu is set to extended precision. So even if an algorithm converges in less steps with extended, the overall computation time might not decrease. In contrast computations with extended precision can eliminate the need for additional checks of intermediate results (overflow, underflow...), which have to be inserted explicitly in other cases. Of course there exists no general rule, it depends on the concrete purpose of a calculation, which algorithm, precision and type (BCD, fixed point...) yields the best results. But there also exists no reason why a coder should be prevented from using existing instructions and data types. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 20.04.2011 15:04, schrieb Hans-Peter Diettrich: Florian Klämpfl schrieb: Am 20.04.2011 00:05, schrieb Hans-Peter Diettrich: Florian Klaempfl schrieb: Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. Some algorithms convert faster with increased accuracy. I guess you meant converge? Right. I had a phone call just while answering :-( This might be true, but processing of extended types is also slower: the memory footprint increases and even worse, extended arrays are typically aligned to 4 or even 16 byte borders so they take 12 or 16 byte in memory. Please don't mix up the internal processing and the external storage of the values. Type coercion (expansion) is frequently used in the evaluation of expressions, be inside a GPU or FPU. Actually, this is even another problem of the x87 fpu: expressions are often evaluted more precisely than required thus resulting in unpredicatable results because it depends on the compiler if it stores a temp. value during evaluation of an expression in memory or not. The correct solution to round after each operation is too slow, everything else is unpredictable. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Op Wed, 20 Apr 2011, schreef Hans-Peter Diettrich: Of course there exists no general rule, it depends on the concrete purpose of a calculation, which algorithm, precision and type (BCD, fixed point...) yields the best results. But there also exists no reason why a coder should be prevented from using existing instructions and data types. Well... I actually believe compilers should support extended precision. I frequenly get Fortran programs that I need to benchmark that use the REAL*10 type. Do those programmers have good reasons for using REAL*10? Probably not. They use best precision by default. They code in Fortran because of this kind of support. No, not GNU Fortran, it doesn't support REAL*10, so I need to use the expensive commercial compilers. They don't care, they don't pay for it. Is it slow? Yes. Do they care? Sometimes. But... parallelizing over 256 cores gives more benefit than using fast double precisions. They start asking government subsidies for the next big supercomputer for the sake of promoting science. That's what your tax money goes to. Shake your head... It's stupid, I'm doing that for a few years already. But the solution is not to remove extended support from the compiler. Users will walk away. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
2011/4/19 Nikolai Zhubr n-a-zh...@yandex.ru: Now, with the introduction of 64-bit processors IIRC AMD took care of this problem by providing some means to execute floating-point operations without the need for traditional FPU register space, thus allowing to avoid the need to save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on earlier CPUs. Very interesting -- can you provide further detail on this? I could not find anything relevant neither in vol.1 ch.6 nor vol.5 ch.2 of AMD's APM -- is there something I overlooked? -- Alexander S. Klenin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
On 19 Apr 2011, at 11:43, Alexander Klenin wrote: 2011/4/19 Nikolai Zhubr n-a-zh...@yandex.ru: Now, with the introduction of 64-bit processors IIRC AMD took care of this problem by providing some means to execute floating-point operations without the need for traditional FPU register space, thus allowing to avoid the need to save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on earlier CPUs. Very interesting -- can you provide further detail on this? I could not find anything relevant neither in vol.1 ch.6 nor vol.5 ch.2 of AMD's APM -- is there something I overlooked? There are no really new instructions for floating point. However, x86-64 mandates at least SSE2 (while x86 does not), which in turn supports 64 bit floating point math. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
19.04.2011 13:43, Alexander Klenin: 2011/4/19 Nikolai Zhubrn-a-zh...@yandex.ru: Now, with the introduction of 64-bit processors IIRC AMD took care of this problem by providing some means to execute floating-point operations without the need for traditional FPU register space, thus allowing to avoid the need to save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on earlier CPUs. Very interesting -- can you provide further detail on this? I could not find anything relevant neither in vol.1 ch.6 nor vol.5 ch.2 of AMD's APM -- is there something I overlooked? Sorry, I looked into it several years ago, I don't have any links by hand anymore. However, Jonas seem to be more exact on this. I think he is right and AMD just pushed deprecation of x87 in favour of SSE(2). Nikolai ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 19.04.2011 12:12, schrieb Daniël Mantione: Op Tue, 19 Apr 2011, schreef Nikolai Zhubr: ms (supposedly) decided to just not preserve FPU/MMX state between 64-bit processes. MS does preserve FPU states between processes. You can use the x87 on Windows, nothing prevents you from doing so. Maybe the calling convention, but even that you can extend with x87. FPC still uses the x87 FPU for trig. functions on Win64. It's just that the documentation tells you not to use the x87. Yes, because it's strange programming model should be really dropped. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
19.04.2011 14:12, Daniël Mantione: MS does preserve FPU states between processes. You can use the x87 on Windows, nothing prevents you from doing so. Maybe the calling Yes it does for 32-bit processes on win64, guaranteed. But do you have any evidence (tests/documents/links) proving it also does so for 64-bit processes on win64? convention, but even that you can extend with x87. It's just that the documentation tells you not to use the x87. Daniël ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Nikolai Zhubr schrieb: Originally MS spread info it wouldn't work at all under Windows, but that proved to be false, the FPU works technically. Now MS just states it is unsupported. And deprecated: http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit Thanks. I always knew that Windows is not an OS for serious work, but I never heard that from Microsoft so clearly :-( Not being an ms fan whatsoever, but you all seem to have missed the technical point here. Because x87 (and also MMX in some sense) is a coprocessor (and has its own register space) its full state has to be saved/restored (by an OS) between different running processes in case any process might use fpu/mmx. The same applies to the XMM/YMM registers. While dropping MMX support is acceptable, in favor of the new vector arithmetic instruction set, I see no point in dropping 80 bit reals before a new 128 bit arithmetic becomes available. Clearly this may become rather inefficient performance-wise (because, well, an application might just want to use 2 fpu registers at a time, and OS will still have to store the whole bunch all the time...) Now, with the introduction of 64-bit processors IIRC AMD took care of this problem by providing some means to execute floating-point operations without the need for traditional FPU register space, thus allowing to avoid the need to save/restore FPU state. IIRC these are some _new_ opcodes, unavailable on earlier CPUs. When AMD aliased the FPU and MMX registers, I don't understand why they *added* new XMM registers, instead of extending the already existing MMX registers - just for fast switching. But it is as it is... So, for performance reasons, and because 64-bit applications (are now supposed to be) able to do all floating-point without touching the traditional FPU, ms (supposedly) decided to just not preserve FPU/MMX state between 64-bit processes. Thats all. IMHO is makes some sense actually, though it would be much nicer if there was some option to select this deliberately (say at boot time or whatever). At least an application should have a chance to specify, which register sets have to be saved on an task switch. Unless stated otherwise by MS, the entire state should be saved, as long as x87/MMX is only deprecated, not dropped. Any official information on this issue? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Op Tue, 19 Apr 2011, schreef Nikolai Zhubr: 19.04.2011 14:12, Daniël Mantione: MS does preserve FPU states between processes. You can use the x87 on Windows, nothing prevents you from doing so. Maybe the calling Yes it does for 32-bit processes on win64, guaranteed. But do you have any evidence (tests/documents/links) proving it also does so for 64-bit processes on win64? Not at hand, but don't worry, it does preserve FPU states. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Op Tue, 19 Apr 2011, schreef Florian Klämpfl: It's just that the documentation tells you not to use the x87. Yes, because it's strange programming model should be really dropped. Agree, but the 80 bit support makes some people want to use it. And that will stay this way until CPU manufacturers invent an alternative. By the way, recent GCC versions calculate the goniometric functions in software using SSE3, and I checked that this is indeed slightly faster than the x87. So we can get rid to the x87 stuff, should we want. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 19.04.2011 12:27, schrieb Daniël Mantione: Op Tue, 19 Apr 2011, schreef Florian Klämpfl: It's just that the documentation tells you not to use the x87. Yes, because it's strange programming model should be really dropped. Agree, but the 80 bit support makes some people want to use it. And that will stay this way until CPU manufacturers invent an alternative. Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. By the way, recent GCC versions calculate the goniometric functions in software using SSE3, and I checked that this is indeed slightly faster than the x87. I know but as usual, time etc ;) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
In our previous episode, Dani?l Mantione said: By the way, recent GCC versions calculate the goniometric functions in software using SSE3, and I checked that this is indeed slightly faster than the x87. So we can get rid to the x87 stuff, should we want. You'll need to runtime test for SSE3 though. Since the first generation of athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3. I checked and 64-bit Pentium-D's do have SSE3, at least mine does: CPU: Intel(R) Pentium(R) D CPU 2.80GHz (2793.02-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf47 Family = f Model = 4 Stepping = 7 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Am 19.04.2011 15:18, schrieb Marco van de Voort: In our previous episode, Dani?l Mantione said: By the way, recent GCC versions calculate the goniometric functions in software using SSE3, and I checked that this is indeed slightly faster than the x87. So we can get rid to the x87 stuff, should we want. You'll need to runtime test for SSE3 though. Since the first generation of athlon64's (clawhammer and friends, socket 751 or so) doesn't have SSE3. For such a relatively expensive operations, one runtime check per function is imo ok even more since it is predicted perfectly after the first run. I checked and 64-bit Pentium-D's do have SSE3, at least mine does: CPU: Intel(R) Pentium(R) D CPU 2.80GHz (2793.02-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf47 Family = f Model = 4 Stepping = 7 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,DTES64,MON,DS_CPL,CNXT-ID,CX16,xTPR AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Florian Klaempfl schrieb: Using extended typically hides only bad numerical algorithms. There might be some corner cases where extended is usefull but I general I think it's a matter of bad algorithms. Some algorithms convert faster with increased accuracy. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Op Mon, 18 Apr 2011, schreef Hans-Peter Diettrich: Sven Barth schrieb: On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it so. Can you be a bit more concrete? Originally MS spread info it wouldn't work at all under Windows, but that proved to be false, the FPU works technically. Now MS just states it is unsupported. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
On 18 Apr 2011, at 10:13, Daniël Mantione wrote: Originally MS spread info it wouldn't work at all under Windows, but that proved to be false, the FPU works technically. Now MS just states it is unsupported. And deprecated: http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit The x87, MMX, and 3DNow! instruction sets are deprecated in 64-bit modes. The instructions sets are still present for backward compatibility for 32-bit mode; however, to avoid compatibility issues in the future, their use in current and future projects is discouraged. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Jonas Maebe schrieb: On 18 Apr 2011, at 10:13, Daniël Mantione wrote: Originally MS spread info it wouldn't work at all under Windows, but that proved to be false, the FPU works technically. Now MS just states it is unsupported. And deprecated: http://msdn.microsoft.com/en-us/library/ee418798(VS.85).aspx#Porting_to_64bit Thanks. I always knew that Windows is not an OS for serious work, but I never heard that from Microsoft so clearly :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Extended type
Some time ago I've heard a rumor that the Extended type is not supported by x86_64 targets. But AFAIK the x87 FPU continues to exist in 64 bit machines, and is still accessible by the well known coprocessor instruction set. So what's the current state of floating point types in FPC? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
On 17.04.2011 19:30, Hans-Peter Diettrich wrote: Some time ago I've heard a rumor that the Extended type is not supported by x86_64 targets. But AFAIK the x87 FPU continues to exist in 64 bit machines, and is still accessible by the well known coprocessor instruction set. So what's the current state of floating point types in FPC? On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it so. Thus on Win64 Extended=Double. On other x86_64 based operating systems the state might be different. Other CPUs don't even have a coprocessor or only a vendorspecific one (like some ARMs) and thus there's also the rule Extended=Double. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Op Sun, 17 Apr 2011, schreef Sven Barth: On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it so. Thus on Win64 Extended=Double. On other x86_64 based operating systems the state might be different. You can use the x87 on Linux. Don't know for FreeBSD, but I expect yes, since it uses the same calling conventions as Linux and x87 is part of those conventions. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended type
Sven Barth schrieb: On Windows 64-bit you must not use the x87 FPU, because Microsoft wants it so. Can you be a bit more concrete? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel